
Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 




1111111 



(ij) Publication number: 0 531 158 A2 



@ 



EUROPEAN PATENT APPLICATION 



(21) Application number : 92308056.8 
@ Date of filing : 04.09.92 



@ Intel. 6 : G06F7/72 



(S3) Priority : 05.09.91 JP 225986/91 
18.05.92 JP 124982/92 

© Date of publication of application : 
10.0333 Bulletin 93/10 

(§4) Designated Contracting States : 

AT BE CH DE DK ES FR GB GR IE IT U LU MC 
NLPTSE 

@ Applicant: CANON KABUSHIKI KAISHA 
30-2, 3-crtome, Shimomaruko, Ohta-ku 
Tokyo (JP) 



@ Inventor : Iwamura, Keiichi, c/o Canon 
Kabushiki Kaisha 
3-30-2, Shimomaruko 
Ohta-ku, Tokyo (JP) 

Inventor : Yamamoto, Takahisa, c/o Canon 
Kabushiki Kaisha 
3-30-2, Shimomaruko 
Ohta-ku, Tokyo (JP) 

(74) Representative : Beresford, Keith Denis Lewis 
et al 

BERESFORD & Co. 2-5 Warwick Court High 
Hoiborn 

London WC1R 5DJ (GB) 



(54) Method of and apparatus for encryption and decryption of communication data. 
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(57) A method and apparatus which enables a 
circuit of a small circuit scale to perform high- 
speed modular multiplication or modular 
exponentiation which are necessary in encryp- 
tion or decryption in cryptic communication. To 
this, end, modular multiplication Q = A M mod N 
and modular exponentiation C = M° mod N are 
executed by repetition of computation of Z = 
U*V*R* 1 mod N employing an integer R which 
is prime to N. The repetition of computation is 
executed by repeatedly operating a single cir- 
cuit or by simultaneously operating a plurality 
of circuits of the same construction in a parallel 
manner. 
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BACKGROUND OF THE INVENTION 

1. Field of the Inv ntion 

5 The present invention relates to cryptographic scheme employed in various communication services using 

a computer network, such as home banking, firm banking, electronic mail service and electronic conference. 

More particularly, the present invention is concerned with cryptographic scheme which conducts encryp- 
tion of data to transmit and decryption of received cryptogram by using a computation in which two integers 
A and B are multiplied with each other and the product is divided by a third integer N to determine the residue, 
10 i.e., modular multiplication expressed by A-B mod N, as well as a computation known as modular exponentia- 
tion which is expressed by C = M* mod N(C,M,N,e), where e being an integer, and which is executed by re- 
peating the above-mentioned modular multiplication. 

Still more particularly, the present invention is concerned with a communication system which conducts 
- cryptic communication by employing various cryptosystems such as RSA cryptosystem, EIGamal cryptosys- 
15 tern, DH type public key distribution system, ID-based key sharing cryptosystem and zero-knowledge certifi- 
cate cryptosystem. 

2. Description of the Related Art 

20 In recent years, communication systems using computer networks have made a rapid progress, which has 

given a rise to the demand for cryptographic schemes employed for the purpose of protecting data contents. 
High-speed cryptographic schemes are essential in the current trends for greater capacity and higher com- 
munication speeds of networks. 

Modular exponentiation and modular multiplication are very important computations which are used in va- 
25 rious cryptographic schemes. For instance, these computations are used as follows. 

It is known that criptosystem is classified into two types: namely, public-key criptosystem and common- 
key criptosystem. The public-key cryptosystem employs different keys for encryption and decryption. An en- 
cryption key is opened to public, while decryption key is kept confidential. With this system, it is easy to ad- 
ministrate keys but it is difficult to infer the decryption key from the opened encryption key. Cryptosystems 
30 which are based on modular exponentiation and modular multiplication, such as RSA cryptosystem and EI- 
Gamal cryptosystem are used most often as the public-key cryptosystem. 

It has been noted that public-key cryptosystem has a specific use known as authentication, besides the 
confidential communication function. Authentication is a function to confirm whether the transmitter of a mes- 
sage is true and, hence, is referred to also as digital signature. The digital signature using a cryptosystem 
35 avoids any unjust transmission or forgery because the signature is put in terms of a secret key which is known 
only to the person who sends the message. This system is therefore broadly used as authenticated commu- 
nication system in banking and financial businesses. 

As a kind of common-key cryptosystem in which both the person who transmits the message and the p r- 
son who receives the message commonly possess a key in confidence, known is barnum cryptosystem in 
40 which a random number is added to data. The random number used for such a purpose may be a random num- 
ber known as square residue obtained on the basis of modular exponentiation and modular multiplication. 

Such common-key cryptosystem and open-key cryptosystem are often used together with an art known 
as key distribution system or an art known as key-sharing system. Among various key distribution systems, 
most popular is DH type key distribution system proposed by Diff ie and Hellman. This distribution system also 
45 employs modular exponentiation and modular multiplication. Meanwhile, ID-based key sharing system has 
been noticed among the key sharing systems. Modular exponentiation and modular multiplication are also em- 
ployed in this key sharing system, as well as in most of other key sharing systems. 

Cryptographic scheme also includes an art which is referred to as zero knowledge certificate. This art is 
to enable a person to make the opponent be convinced of the fact that the person actually possesses a knowl- 
so edge, without disclosing at all the content of the knowledge, i.e., with zero knowledge. Various procedures 
based on modular exponentiation and modular multiplication are available in this art 

Und r these circumstances, there has be nan increasing demand for circuits which perform effici nt mod- 
ular exponentiation and modular multiplication, in order to make it possible to ff iciently build up various cryp- 
tosystems. Such high-speed modular exponentiation and modular multiplication circuits also contribute to in- 
55 crease in the speed of various cryptosystems. 

As a method of conducting modular multiplication computation using N as a modulus, a method is known 
which uses an integer R which is prime to N. For instance, Montgomery, P.L.:"Modular multiplication without 
trial division" Math, of Computation, Vol.44, 1985, pp.519-521 makes it possible to conduct modular multipli- 
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cation without division, by computing Q = A-B-R- 1 mod N instead of computing Q = A-B mod N. 

Another technique for achieving higher processing speed is a method referred to as parallel processing, 
a typical example of architecture of which is systolic array as well known. Systolic array executes pipeline-ba- 
sed processing using a plurality of types of processing elements (PE), thus realizing a high-speed processing. 

5 Furthermore, the control can easily be conducted locally on PE basis. Thus, systolic array possesses both 
the regularity of the whole structure and locality on PE basis and is known as an architecture which facilitates 
construction of a large scale processing device such as a VLSI. The parallel processing is considered as being 
most suitable for speeding up of modular exponentiation and modular multiplication on a large integer which 
requires a very large scale of processing. Hitherto, however, almost no architecture has been proposed as to 

10 application of parallel processing technique such as systolic array for modular exponentiation and modular mul- 
tiplication. 

An array using Montgomery technique has been proposed by Even, (see Shimon Even: "Systolic modular 
multiplication, "Advances in Cryptdogy-CRYPTO'90, pp. 619-624, Springer-Verlag.) 

In order to obtain sufficient security against cryptanalysis for a wrong purpose, the integer used in modular 
15 exponentiation and modular multiplication should have a large number of bits which is 512 or greater. Com- 
putational complexity for such a large integer is huge and cannot be dealt with at high speed by an ordinary 
computer. 

Another problem is that, when modular exponentiation is executed by repetition of the Montgomery meth- 
od, the maximum bit number of the output is progressively increased each time the modular multiplication is 
20 conducted, so that it is difficult to execute modular exponentiation by a single circuit The array proposed by 
Even does not contain any suggestion concerning PE which would conduct processing when the bit number 
of the output of modular multiplication has exceeded the bit number of the input value and, hence, cannot fully 
perform modular exponentiation. 

Furthermore, known Montgomery method requires, as will be detailed later, that separate computations 
25 are conducted on A, B and Q before computing Q = A B R- 1 mod N, thus necessitating a plurality of computing 
means. 

In particular, the array proposed by Even is composed of an array which performs a multiplication T = 
A-B and an array which performs a modular multiplication Q = T-R- 1 mod N on R which is treated as a constant 
Thus, the systolic array of Even was inefficient in that it essentially employ two types of arrays: one for com- 
30 puting T and other for computing Q. In addition, the systolic array proposed by Even has inferior adaptability 
because it performs only Ibtt based computation is performed in PE. 

Thus, the known methods involve various drawbacks and cannot provide efficient modular multiplication 
circuit 

35 SUMMARY OF THE INVENTION 

Accordingly, an object of the present invention is to provide a computing apparatus which can perform, 
with a circuit of a small scale, high-speed modular multiplication of an integer of a large figures, as well as a 
communication method which employs encryption7decryption by using the apparatus, thereby overcoming the 
40 above-described problems of the prior art 

Another object of the present invention is to provide a computing apparatus in which modular multiplication 
is performed by a plurality of processing elements of the same type so as to facilitate integration of the com- 
puting circuit 

Still another object of the present invention is to provide a method in which modular exponentiation and 
45 modular multiplication employed in cryptic communication is executed simply by repeating modular multipli- 
cation using R which is prime to N which is the residue. 

A further object of the present invention is to provide a circuit which performs, in accordance with Mon- 
tgomery method, high-speed modular exponentiation and high-speed modular multiplication with a reduced 
scale of the circuit 

so According to one aspect of the present invention, there is provided a cryptic communication method using 

a communication apparatus which performs encryption or decryption of a communication content by executing 
a modular multiplication A B mod N of integers A and B by using N as the modulus, the communication appa- 
ratus having at least one computing unit which computes and outputs Z = U V R- 1 mod N by using an integer 
R which is primer to N, the method comprising the steps of: inputting to one of the computing units A and a 

ss constant Rr which is expressed by Rr = R 2 mod N, thereby causing the computing unit to output Ar = 
A-Rr.R- 1 mod N; inputting to one of the computing units B and the constant Rr thereby causing trie computing 
unit to output B R = BRrR-* 1 mod N; inputting to the computing unit the Ar and B R thereby causing the com- 
puting unit to output T R = Ar-Br-R- 1 mod N; and inputting to the computing unit the T R and a constant 1 thereby ■ 
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causing the computing unit to output, as the Q t T R -1R- 1 mod N, whereby the modular multiplication Q = A B 
mod N is executed. 

- According to another aspect of the invention, there is provided a cryptic communication method using a 
communication apparatus which performs encryption or decryption of a communication content by using a 

5 modular exponentiation C = M e mod N concerning integers M and e using N as the modulus, the communi- 
cation apparatus having at least one communication unit which computes and outputs Z = U-V-R- 1 mod N by 
using, with respect to input data U and V, an integer R which is primer to N, the method comprising the steps 
of. inputting to one of the computing units M and a constant Rr which is expressed by R R = mod N, thereby 
causing the computing unit to output M R = M-Rr-R 1 mod N; representing the binary expression of e by e = 

10 [e l , e w , e 1 ], determining the values of e 1 starting from the lowest order bit representing the initial value of 
Cr by Rr*R~ 1 mod N, inputting Cr and M R to one of the computing units when e 1 is determined to be equal to 
e 1 = 1, thereby causing the computing unit to output Cr-M r R- 1 mod N as a new Cr; determining whether i of 
the e 1 is greater than 1 or not; inputting, when i is greater than 1 , Cr as two input data to one of the computing 
unit, thereby causing the computing unit to output, as new value of Cr, C r -C r *R- 1 mod N; and after completion 

15 of processing on all e', inputting the Cr and 1 as a constant to one of the computing units, thereby causing 
the computing unit to output, as the aimed C, C = Cr-1-R- 1 mod N, whereby the modular exponentiation C 
= M* mod N is executed. 

According to still another aspect of the present invention, there is provided a cryptic communication meth- 
od using a communication apparatus which performs encryption or decryption of a communication content by 

20 using a modular exponentiation C = M e mod N concerning integers M and e using N as the modulus, the com- 
munication apparatus having at least one communication unit which computes and outputs Z = U-V-R- 1 mod 
N by using, with respect to Input data U and V, an integer R which is primer to N, the method comprising the 
steps of: inputting to one of the computing units M and a constant R R which is expressed by Rr = R 2 mod N, 
thereby causing the computing unit to output Mr = MRr.R- 1 mod N; representing the binary expression of 

25 by e = [e\ e t-1 , e 1 ], determining the values of e 1 starting from the highest order bit; representing the initial 

value of C R by RrR -1 mod N, inputting C R and M R to one of the computing units when e' is determined to be 
equal to e' = 1, thereby causing the computing unit to output Cr-Mr-R -1 mod N as a new C R ; determining wheth- 
er i of the e' is greater than 1 or not; inputting, when i is smaller than 1, M R as two input data to one of the 
computing unit, thereby causing the computing unit to output, as new value of M R , M R -M R .R- 1 mod N; and after 

30 completion of processing on all e 1 , inputting the Cr and 1 as a constant to one of the computing units, thereby 
causing the computing unit to output, as the aimed C, C = C R *1R- 1 mod N, whereby the modular exponen- 
tiation C - M e mod N is executed. 

According to a further aspect of the present invention, there is provided a cryptic communication method 
which employs encryption or decryption of a communication content by employing a modular multiplication Q 

35 =s A-B mod N for input integers Aand B using N as the modulus, the method comprising the steps of; computing 
AR mod N using the input A and an integer R which is primer to N, thus determining Ar as the computation 
result; computing B-R mod N using the input B and the R, thus determining B R as the computation result; com- 
puting Ar-Br-R- 1 mod N on the basis of the computing results Ar and B R and the R, thus determining T R as 
the computation result; and computing T R -R- 1 mod N on the basis of the T R and the R, thus determining the 

40 Q as the computation result; wherein the computation for determining the T R is executing by successively com- 
puting: 

T, = (T,_ , + A|-B R -Y + M,. i*N)/Y 
Mi . i = (T| . ! mod Y)-( - N~ 1 mod Y) mod Y 
wherein Y equals to 2y and A| are sections of Ar obtained by dividing Ar for every v bits, where v is an optional 
45 integer. 

According to a still further aspect of the present invention, there is provided a cryptic communication meth- 
od which employs encryption or decryption of a communication content by employing a modular multiplication 
Q = A-B mod N for input integers A and B using N as the modulus, the method comprising the steps of: com- 
puting A-B mod N using the input Aand an integer R which is primer to N, thus determining Ar as the compu- 
50 tation result; computing B-R mod N using the input B and the R, thus determining B R as the computation result; 
computing Ar*B r -R- 1 mod N on the basis of the computing results Ar and B R and the R, thus determining T R 
as the computation result; and computing T R -R- 1 mod N on the basis of the T R and the R, thus determining 
th Q as the computation result; wherein the computation for determining the T R is executing by successively 
computing: 

55 T, = (T, . ,/Y + Aj Br*) + Mj.N 

M, . , = ((T,_ , /Y + Ai-B^mbd Y)-( - NM mod Y) mod Y 
wherein Y equals to 2? and Aj are sections of Ar obtain d by dividing Ar for very v bits, where v is an optional 
integer. 
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According to a still further aspect of the present invention, there is provided a communication apparatus 
which performs encryption or decryption of a communication content by executing a modular multiplication 
A-B mod N of integers Aand B by using N as the modulus, the communication apparatus comprising: first com- 
puting means for computing Ar = A-Rr.R -1 mod N, upon receipt of A and a constant Rr which is expressed 

5 by Rr = R 2 mod N t where R is an integer primer to N; second computing means for computing Br = B-Rr-R-i 
mod N upon receipt of the constant Rr and B; third computing means for computing T R = Ar-Br-R- 1 mod N 
upon receipt of Ar and Br output from the first and second computing means; and fourth computing means 
for computing Tr-1-R -1 mod N and outputting the computation result as the Q t upon receipt of TR output from 
the third computing means and a constant 1. 

10 According to a still further object of the present invention, there is provided a communication apparatus 

which performs encryption or decryption of a communication content by using a modular exponentiation C = 
M* mod N concerning integers M and e using N as the modulus, the communication apparatus comprising: 
first computing means for computing M R = MR R -R- 1 mod N upon receipt of M and a constant Rr which is ex- 
pressed by Rr - R 2 mod N; first determining means for determining the values of e* starting from the highest 

15 order bit, wherein the binary expression of e is expressed by by e = [e 1 , e*- 1 ( e 1 ]; storage means for updating 

and storing the value of Cr by using C R = RrR- 1 mod N as the initial value; second computing means which 
receives the Cr stored in the storage means and Mr computed by the first computing means when e 1 is de- 
termined to be equal to e 1 = 1 , thereby causing the computing unit to output Cr-M r -R~ 1 mod N as a new Cr; 
second determining means for determining whether i of e' is greater than 1 ; third computing means for receiving 

20 Cr when i is determined by the second determining means to be greater than 1 , and outputting, as new value 
ofCR, Cr-Cr-R- 1 mod N; and fourth computing means which computes, upon receipt of Cr stored in the storage 
means and 1 as a constant, C = Cr-1-R- 1 mod N after completion of computations performed by the second 
and third computing means on all the values of e' t thereby outputting the computation result as the C. 

According to a still further aspect of the present invention, there is provided a communication apparatus 

25 which performs encryption or decryption of a communication content by using a modular exponentiation C = 
M* mod N concerning integers M and e using N as the modulus, the communication apparatus comprising: 
first computing means for computing M R = M-Rr*R- 1 mod N upon receipt of M and a constant RR which is 
expressed by Rr = R 2 mod N; first determining means for determining the values of e> starting from the lowest 
order bit wherein the binary expression of e is expressed by by e = eM, e 1 ]; first storage means for 

30 updating and storing the value of C R by using Cr = RrR- 1 mod N as the initial value; second storage means 
for updating and storing the value of Mr using the output of the first computing means as the initial value; sec- 
ond computing means which receives the Cr stored in the first storage means and M R computed by the first 
computing .means when e* is determined to be equal to e' = 1, thereby causing the computing unit to output 
Cr-Mr-R- 1 mod N as a new C R ; second determining means for determining whether i of e 1 is smaller than t; 

35 third computing means for receiving M R stored in the second storage means when i is determined by the sec- 
ond determining means to be smaller than t, and outputting, as new value of Mr, M r -M r .R-i mod N; and fourth 
computing means which computes, upon receipt of Cr stored in the first storage means and 1 as a constant, 
C = C R r1R~ 1 mod N after completion of computations performed by the second and third computing means 
on all the values of e', thereby outputting the computation result as the C. 

40 Other objectives and advantages besides those discussed above shall be apparent to those skilled in th 

art from the description of a preferred embodiment of the invention which follows. In the description, reference 
is made to accompanying drawings, which form a part hereof, and which illustrate an example of the invention. 
Such example, however, is not exhaustive of the various embodiments of the invention, and therefore reference 
is made to the claims which follow the description for determining the scope of the invention. 

45 

BRIEF DESCRIPTION OF THE INVENTION 

Fig. 1 is an illustration of an example of a modular multiplication circuit in a communication system which 
employs a cryptosystem; 
so Fig. 2 is an illustration of an example of encryption/decryption apparatus; 

Figs. 3, 12, 19, 24, 27, 29, 33 and 35 are illustrations of circuits of processing element (PE) which conducts 
modular multiplication; 

Figs. 4 to 10, 13, 14, 16, 25, 28, 30, 31 , 34, 36 and 37 are illustrations of examples of computing apparatus 
which employ PEs; 

55 Fig. 11 is an illustration of a PE which performs modular multiplication on a finite field; 

Figs. 15 to 17 and 19 are illustrations of examples of PE which performs modular multiplication for RSA 
cryptosystem; 

Fig. 20 is a circuit diagram showing th construction of a multi-processing circuit using SRC (systolic RSA 
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chip); 

Fig. 21 is a circuit diagram showing an example of a modular multiplication circuit; 
Fig. 22 is a circuit diagram showing an example of a modular exponentiation circuit; 
Fig. 23 is a block diagram showing an example of the construction of the modular multiplication circuit; 
5 Fig. 26 is an illustration of a common PE; 

Fig. 32 is a circuit diagram showing the construction of a circuit which executes modular exponentiation 
and modular multiplication using SYMC; and 

Fig. 38 is a circuit diagram showing the construction of a circuit which executes modular exponentiation 
and modular multiplication by using MEC. 

10 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[Cryptic Communication System] 

15 A description will be given of a cryptic communication system in a communication network shown in Fig. 

1 . The connection diagram of Fig. 1 shows a local communication network such as a LAN (Local Area Network) 
or a large-area communication network such as telephone communication network. Symbols A to Z indicate 
users to each of whom is allocated a communication apparatus or terminal T for connection to the network. 
An encryption apparatus is adapted to encrypt received information and outputs the encrypted information. 

20 For instance, the arrangement may be such that each communication terminal T incorporates the encryption 
apparatus so that encrypted information is output from each terminal T or such that the encryption apparatus 
is connected between each terminal T and the network so that the output of the communication terminal T is 
delivered to the network after encryption. The encryption apparatus also may be incorporated in an apparatus 
which is connected to each communication terminal T and which delivers information to the associated com- 

25 munication terminal T. It is not essential that the encryption apparatus is always connected to the communi- 
cation terminal. Namely, the arrangement may be such that the encryption apparatus is incorporated in a port- 
able device such as an IC card so that it may be optionally connected when required to the communication 
terminal T or an apparatus connected to the terminal T. By using such an encryption apparatus, it is possible 
to conduct criptic communication such as a confidential communication, an authentication communication, key 

30 sharing or zero knowledge certificate communication. 

This encryption apparatus necessitates a modular multiplication circuit or a modular exponentiation circuit 
Such a circuit may be a later-mentioned modular exponentiation circuit of the type which, upon receipt of a 
plain text M, outputs a cryptogram C = M e mod N, where e and N are values which are input separately from 
the message M or stored beforehand in a memory. In such a case, the modular exponentiation apparatus itself 

35 constitutes the encryption apparatus. In case of a confidential communication, decryption may be performed 
by a similar encryption apparatus which performs a computation M = C d mod N which is reverse to the above- 
mentioned encryption computation. 

The arrangement also may be such that the modular multiplication circuit or the modular exponentiation 
circuit is constructed as a part of the encryption apparatus. In such a case, the circuit conducts the computation 

40 on information input to the encryption apparatus from the exterior or on the result of a processing performed 
by another processing unit incorporated in the same encryption apparatus. 

An access to a recording medium such as a magnetic disk may be regarded as a kind of communication. 
In this case, the accessing device which makes access to the recording medium is considered as being a com- 
munication terminal. A storage system, therefore, also can utilize a cryptosystem by employing the circuit of 

45 the invention, as is the case of ordinary communication system. 

A description will now be given of a communication method using RSA cryptosystem.' Encryption and de- 
cryption are respectively represented by the following formulae: 

Encryption: C = M e mod N 
Decryption: M = C d mod N 

so wherein M represents a plain text to be transmitted, C indicates a cryptogram, e indicates an encryption key 
opened to public, d indicates a decryption key and N represents a modulus which is opened to public. 

Thus, encryption and decryption of RSA cryptosystem can be executed modular exponentiation circuits 
which have constructions similar to each other. The following description, therefore, mainly refer to encryption. 
The modular multiplication C = M e mod N may be conducted simply by repeating modular multiplication 
55 of two numbers. When M and e are large, however, the amount of computation becomes huge. According to 
the invention, therefore, computation is executed in accordance with the following algorithm. In the algorithm 
shown below, is an int ger having k bits and is expressed by: 

e = e k , e k . 1t e2, 1- 



6 



EP 0 531 158 A2 



Algorithm B 

INPUT M, e, N (input) 

5 

C = 1 (initial- set) 

For i = k to 1 

10 If ei = 1 Then C = C-M mod N (computation 1) 

If i > 1 Then C = C-C mod N (computation 2) 

Next 

15 

In this case, therefore, the modular exponentiation is conducted by repeating modular multiplication C = C B 
mod N (B is M or C). An example of a circuit which efficiently performs this modular multiplication will be de- 
scribed hereinunder 

20 [Example of Construction of Modular Multiplication Circuit] 

The following description is based upon an assumption that a condition of n a = n b = n n = n-m exists, 
for the purpose of simplification of explanation. Computation of A-B mod N = R, where A, B and N are integers 
having n-m bits, is conducted as follows. A multiplteator which performs a multiplication a-b = c (a and b are 
25 integers of small figures of m bits) can be realized by a known device such as, for example, a ROM. 

Each of A, B and N are divided by n into n sections each being of m bits. A, B and N are then expressed 
as follows: 

A = A„ . r X°- 1 + A„. r X n - 2 + + A V X + Aq 

B = B n . r X°- 1 + B n . 2 -X ft - 2 + .... + B r X + B 0 
30 N = N n _ r X"- 1 + N n . 2-X"- 2 + .... + Nl . X + N 0 

It is assumed that X equals to 2 m , i.e., X = 2 m ( and and the bit serieses formed by dividing A, B and N for 

every m bits from the upper figure are represented by A^ B^, and (i = 1 n), respectively. Under such 

conditions, A, B and N can be regarded as polynomials, so that R = A B mod N can be expressed as follows. 

R = A B - Q N (Q = [A-B/rq) 
35 wherein [Z] represents the greatest one of integers which do not exceed Z. 

It is therefore possible to determine R in accordance with the following procedure. 



40 
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Algorithm C 



Rn - 0 

For j = 1 to n 

Rj = Rj-i-X + An-j'B - Qj-i'N 

= Rj-i'X + A n _j-B + Ej-i - Lj-l*X 



Next, 

If R n > N then R n = R n - Qn*N 

on conditions of : 

Lj-i = [Rj-i-X/X n ] « [Rj-i/Xn-1] 
Qj-1 = [Lj-i-xn/N] r Qn = [Rn/N] 
Lj-i-X n = Qj-i-N + Ej-i (Ej-i < N) 

The algorithm C executes mod N for the value Lj . r Xn of the term Rj . r X which exceeds the greatest 
figure X^ 1 of N, in order to eliminate necessity for determination of R > N. Namely, mod N computation is exe- 
cuted on the coefficient of R which has exceeded X*- 1 in terms of bit number, so that it is not necessary to 
determine whether the condition R > N is met 

Furthermore, instead of execution of - Q h r N which is L, . v X n mod N, subtraction of L, . r X n and addition 
of Ej.! as the residue are conducted. That is, is converted into E^i and the thus obtained is added. 
By this method, all the subtractions made by mod N can be carried out by adding computations. In this case, 
however, it is necessary to finally conduct computation of Rn = Rn - (VN upon determining R n > N. Such 
computation, however, is executed finally after completion of the repetition of the above-described computa- 
tion. This final computation can be executed by, for example, as separate circuit since it need not be conducted 
in the course of repetition of the above-mentioned computation. Thus, the necessity for this final computation 
does not affect the processing speed of the whole system. 

As the next step, in order to eliminate any delay due to computation of Rj t Rj in the algorithm C is divided 
into Rj^ ! and B also is divided into Bn_ if thus providing the next algorithm D. 

Algorithm D 

For i = 0* to n 

R *j,n-1 = Dj-i^-i-i + Cj_i /n _i- 2 + <A n — j .B n -i) + 
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upro (An-j.Bn_j.-i) + Ej_i #n _i 

D j,n-1 - (Tj^-!) 
Cj,n-i = upa (Rj, n -i) 

Next 
wherein 

Rj-l,n-X n - Qj-i-N + Ej_i, Qj-i = [Rj-i, n -X n /N] (1) 

E j-1 = Ej-l^n-l-X 11 " 1 + Ej. lrn . 2 -xn-2 + ... + 

Ej- lr l-X + Ej- lf O (2) 
l>0,n-i-l ~ C 0/ n-i-2 + Eo,n-l = B n = B-i = 0, 
and 

dw m (Z) represents Z value of figures not greater than 
2 m , while upa represents a value obtained by dividing, by 
2 m+1 , z value not smaller than 2 m+1 . 

The algorithms C and D are basically the same but the algorithm approximates the operation of the actual 
circuit more closely than the algorithm C. 

[Structural Example of Encryption/Decryption Apparatus] 

Then, an encryption / decryption apparatus which encrypts / decrypts data using the aforesaid algorithms 
will now be described. Assuming that a plain text to be communicated is M t a cryptogram is C, a public cryp- 
togram key is e, a decryption key is d, and a public modulo is N, the encryption and the decryption of the RSA 
cryptograph are expressed by the following modular exponentiations: 

Encryption: C = M« mod N 

Decryption: M = & mod N 

The modular exponentation: C = M e mod N is calculated in accordance with the aforesaid Algorithm B 
Therefore, the modular exponentation can be realized by repeating the modular multiplication C = C * B 
mod N (B is M or C). A circuit capable of efficiently executing the algorithm is shown in Fig.2. Referring to 
Fig.2, reference numerals 101 and 102 represent shift registers for respectively storing the values of M and 
e. Reference numerals 103 and 104 represent registers for respectively storing the values of N and C. Refer- 
ence numerals 105 and 106 represent select switches forselecting the inputs and 107 represents a multiplexer 
for selecting the value of C in the register 104 for each m bits (m is an arbitrary integer) from the upper digits 
to transmit it in serial. Reference numeral 1 08 represents a modular multiplication circuit for executing the cal- 
culation C = C*B mod N and arranged as shown in Figs. 1 to10. Reference numeral 109 represents a controller 
for discriminating whether or not ei = 1 ori> 1 to control calculations 1 and 2 of the Algorithm B or controlling 
a clear signal or a preset signal for the selector and the register at the time of the receipt of the signal or the 
initialization. The controller 109 can easily be formed by a counter, a ROM and some logic circuits. 
Then, the operation of the circuit shown in Fig.2 will now be described. 

The circuit receives plain text M, public key and public modulo N. Therefore, M, e and N are in serial or 
parallel supplied to the register 103. At this time, the selector 105 selects M to supply M to the register 101. 
Simultaneously, initialization is performed in such a manner that C = 1 by the dear signal or the preset signal 
for the register as an alternative to supplying the value of C to the register 104. 

After the input and the initialization has been completed, the modular multiplications in accordance with 
the calculations 1 and 2 are commenced. The difference betwe n the calculation 1 and the calculation 2 lies 
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in a fact that B is M or C in the modular multiplication C = C • B mod N. Therefore, in a case where the cal- 
culation 1 is executed, the selector 106 selects serial output M for each m bits from the register 101. In a case 
where the calculation 2 is executed, the selector 1 06 selects serial output C for each m bits from the multiplex r 
107. The serial output M for each m bits from the shift register 101 is again supplied to the shift register 101 
via the selector 105. The modular multiplication circuit 108 are constituted and operated as described above. 
The output C from the modular multiplication circuit 1 08 is, in parallel, supplied to the register 1 04 so as to be 
used in the next residue multiplication, so that the calculations 1 and 2 are efficiently repeated. If the apparatus 
is arranged to receive C and d in place of M and e, a cryptogram can be decrypted. 

Parallelizing the Calculating Circuit with Processing Element] 



Algorithm H 
FOR j ■ 1 TO n 
FOR i = 1 TO n 

Rj/ n-i = Dj-lr n-i + Cj-2, n-i + dw m <A n -j * B n-i> 

+ UPm <A n -j-i * B n -i> +'Ej-i, n -i 
Dj, n -i = dWjn (Rj, n -i> 
Sj, n _i = upro (Rj, n -i) 
Cj /n -i « Sj, n -i 

NEXT 
NEXT 

where Rj-i, n *X n = Qj-i*N + Ej-i, Qj-i « [Rj-i, n*X n /N] 
Ej-i - Ej.arn-l*X«-l + Ej-i, n -2*xn-2 + . . . + 
Ej-i,i*X + Ej-i r : o 

D-0/ n-i-1 = Co, n-i-2 = Eq, n-l=B n = B-i = 0 



dW m (Z) : value smaller than 2 m digit of Z 
up m (Z) : value obtained by dividing a value 
larger than 2 m+1 of Z by 2 nH ' 1 

Although Algorithm H and Algorithm C are basically the same, Algorithm H is more suitable to parallelize 
the circuit. Algorithm H is executed by a circuit shown in Figs.3 and 4. 

Fig.3 illustrates a circuit for executing basic calculation R = R*X + An_j*B mod N of the residue mul- 
tiplication and called a basic operator (processing element which is abbreviated to °PE" hereinafter). 

Specifically, it performs calculation R^ n . i = Dj. i, n - t + Cj . 2 »n - i + dW m (An - j * B n . t ) + up m (An. 
j - i3n-0 + Ej- itn - i as shown in Algorithm H. 

Fig.4 illustrates a structure the overall body of which is formed into a so-called systolic array. The systolic 
array performs the calculation by a pipeline processing by PEs which ar small and same functional blocks. 

10 



EP 0 531 158 A2 



The PE is formed as shown in Fig.3. The PE shown in Fig.3 comprises a m*m-bit multiplier for calculating - 
An.j*Bn.b ROMs for respectively transmitting the value of Ej. 1iBb1 from the value of R^ 1tn in accordance with 
Equations (6) and (7), n+ 1 pieces of 4-input m-bit 

adders each having a 2-bit carry or 5-input adders, a m+2 bit register for storing R^, {i = 1 ( . M| n) registers for 
5 respectively storing A^i, 1 and a two-stage register for delaying B^ h T^ 

The lower m bits of this register means the lower m digits (dW m (Rj, n . t ) = Dj, n . ,) of R^ . ,, while the 
upper 2 bits means a value (up m (Rj, n . |) - S, in . J larger than m+1 digit of R^j. As a result, the carry for each 
adder can be absorbed at each clock by S i>n . h Furthermore, S i<n _ , is made to be Cj in . , at the right PE before it 
■ . is, as a carry, added together with the lower m bits of the right register in the second PE counted in the right 
10 direction. Therefore, the delay time generated due to the calculation of Rj performed in accordance with Al- 
gorithm C can be eliminated. 

As described above, D hn . ( , Rj tn ., and Cj trvl respectively show the state of the register, where subscript i 
means the dock and j represent the sequential order in Fig.4. Therefore, the position of the PE from j = 1 
(#1) to j = n (#n) from right to left is indicated. 
is Then, the operations of the structures shown in Figs.3 and 4 will now be described. Then, a description will 
be made about a timing chart of circuits shown in Figs.3 and 4 in a case where n = 4. 
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tt l : L out I AB6 1 AB5| AD4 1AB31 0 | 0 | AB6| AB5I AB4 |AD3| 
B in I B3IB2IBH BOI 0 I B 3| B 2 1 B 1 1 B 0| 0 I 
U(A3) I AB7 1 AB6 1 AB5 1 AB4 1 0 I 0 I AB6 1 ABSI AB4 1 0 I 
P I 0 I 0 I 

E j-1 I 0 10 I 

D out . |D7|D6|D51D4|D3|D7|D6|D5|D4|D3| 
S out I 0 |S7lSS|SS| 0 | 0 |S7|S6|S5|S4| 



20 



#2 : L out | AB5| AB4 |AB3 |AB2| 0 I 0 i AB5 1 AB4 1 A83 1 AB2 1 

B in | B3| B2IB1I B0| 0 I B 31 B 2| B 1 1 B0| 0 I 

U(A3) |AB6|AB5|AB4|AB3! 0 |AB6| AB5IAB4 |AB3| 0 I 

25 p i D7+S7 | D7+S7 | 

E j-l I E6i E5IE4IE3I 0 I E5I E4I E3IE2I 0 I 

D out 1 D6|D5|D4| D3| D2I D6| D5ID4ID3I 

C out 10 10 |CG|C5| 0 I 0 | 0 |C6|C5|C4| 

S out 1 0 |S7|S6|S5|S4| 0 | S7| S6| S5| S4| 



30 



35 



40 



45 



50 



#3 : L out |0 ]AB4 1 AB3 1 AB2 1 AB1 1 0 |AB4|AB3| 

B in IB3IB.2IBHB0I 0 [B3|B2|B1| 

U(A3) |AB5|AB4|AB3|AB2| 0 I AB5| AB4 |AB3| 

P |S7;D6*S6 IS7;D6+CB 

E j-l jE5|E4|E3lE2| 0 IE4IE3IE2I 
D out ID5ID4ID3ID21DH D51D4I 

C out 1 0 | 0 IC5IC4I 0 | 0 I 0 IC5IC4J 

S out IS6ISSIS4IS3I 0 1S6|S5| 



55 
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L 


out 


1 0 |AU4|AD3|AI!2 


IAG1I 0 I 


8 


in 


IB3IB2IBIIDO 


1 0 ID3I 


U 


(A3) . 


1 AB5I AB4 I AB3 | AC2 


1 0 IAB4I 


P 




1 S6: D5» S5 


Ken 


E 


j-1 


1 E4| E3| E2| El 


1 0 IE3I 


D 


out 


ID4ID3ID2 


1 DIID0I 


C 


out 


10 10 IC4IC3I 0 


1 0 I 0 1 


S 


out 


IS5I S4| S3 


IS2I 0 I 



The initial state of each register shown in Figs.3 and 4 is 0. 

20 When B is supplied from Bin at the first PE (J = 1) for each m bits in an order B3,..., B<>, the multiplier, which 
receives their values, sequentially transmits A3*B fr i (i = 1 4). For example, A 3 *B 3 is the coefficient of X 6 in 
terms of the multiplication of a polynomial and as well as It includes the coefficient of X 7 since the aforesaid 
output is 2m bits. Therefore, the output from the multiplier divided into upper and lower m digits is expressed 
by ABj (i = 7 4) in the aforesaid chart because outputs U of the upper m bits are the coefficients of X 7 to 

25 X 4 - Since outputs Lout of the lower m bits are the coefficients of X 6 to X 3 , it is expressed by ABi (i= 6,..., 3). 

The upper rrvbit outputs U are supplied to the adders of the same PE, while the lower m-bit outputs Lout 
are temporarily delayed by one dock by an external register to be D 0 ,n~i. and then it is added by the adder of 
the No. 1 

PE. Furthermore, this adder adds feedback output Lout supplied from a No. 2 PE to be described later to 

30 supply the result of this to a register (R1,n-i). At this time, the lower m bits of R 1vr> .j (i = 1 4) is, as D 1vn .| 

transmitted to the next PE. On the other hand, m+1 bits or more, which are the carries, are transmitted as S lrft _ 
1 to pass through the next PE before it is transmitted to the second PE counted in the forward direction as C lfn _ 
,. Since D, tn . i and C 1tn . , are the coefficients of X 7 to X 4 and X 4 to X 5 in terms of the coefficient of the polynomial, 
they are expressed by D k (k = 7,..., 4) and C k (k = 7, 5) . in the aforesaid chart, other signals are as well 
35 as expressed by using the coefficient of the polynomial. Furthermore E0,n-i expressing the residue is 0 and 
each Tn.fi = 1f.». 4) which is the timing of the residue is delayed by 2 clocks by the register before it is trans- 
mitted to the next PE. 

When B is similarly supplied to the next PE (j = 2) at j= 1 , A 2 *Bn_ , (i = 1 4) is transmitted from the multiplier 

for each upper and lower m bits. At this time the lower m bits are, as Lout, fed back to the No. 1 PE. 

40 When the result of an addition of D7 and S7 supplied from the No. 1 PE is, as R1,n, stored in a register 

P. Then, the value of Ej. n obtained from Equation (1) is synchronized with T^, so as to be sequentially trans- 
mitted from the ROM as E 1fn _ 1 to the adder. The result of this is, as R 2 ,r>- 1. supplied to the register so as to be 
transmitted to the next PE as D k and S k . 

When B is supplied at the next PE (j = 3), A^Bn. i (i = 1 4) is transmitted from the multiplier for each 

45 uppe and lower m bits before U, Lout, Din and E h 1 are added similarly to the former PE. Furthermore, C 1vn . 1, 
which is the carry from the second former PE, is added, so that the calculations in the Algorithm H are per- 
formed. Since U, Lout, Din and Ej. 1 respectively are m bits, the output from the adder is m+2 bits. Therefore, 
the register of R^i must have m+2 bits. If the carry bit is 2 bits, the output from the adder is m + 2 and is 
therefore not changed if it is added to the adder as a carry. 

so At the next PE Q = 4), an operation similar to that when J=3 is performed. As a result, the value stored in 

each register is R^. 

[Embodiment 2 of Modular Multiplying Circuit Having PE] 

55 The four registers which receive Bin and Tin in the PE shown in Fig.3 act only to delay the inputs Bin and 

Tin by 2 clocks. Therefore, the overall size of the circuit can be reduced by arranging the structure in such a 
manner that four output registers which concern B and T are omitted from the PE shown in Fig.3 and the values 
of B and T are stored in different shift registers as shown in 
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Fig.5 to sequentially supply them to each PE at every three registers. 
[Embodiment 3 of Modular Multiplying Circuit Having PE] 

5 Since the calculations to be performed in all of the PEs are the same, a structure may be employed in 

which A„. i is set to PE, B is supplied to perform the calculation, the output from the PE is temporarily stored 
in a memory, An- 2 is reset to the same PE immediately after the operation of the PE has been completed to 
feed back the outputs from the B and the memory to perform the calculations and the calculations about the 
A*. | (i = 3,..., n) are repeated, so that the residue multiplication can be performed by one PE. Since the number 

10 of the feedback operations can be decreased to 1/p, the processing speed can be multiplied by p times. There- 
fore, in the aforesaid method, the size of the circuit and the processing speed can arbitrarily and easily traded 
off depending upon the number of the PEs. Then, a method in which the feedback is used will now be descri- 
bed. 

First, Rj and B are respectively decomposed to R^j and B^t similarly to Algorithm H to express it as in 
is Algorithm I: 

Algorithm I 

FOR h = 1 TO n/p 
20 * 

FOR k = 1 TO p 
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FOR i =1 TO n 

j=p *" (h-1 ) + k 

5 

Rj/ n-i = *>j-l/ n-i + Cj-2/ n-i + <*W m (A n _j * B n -i) 
+ upm (A n -j-i-B n .i) + Ej-xrn-i 
10 Dj /n -i = dW m <Rj, n -i) 

Sjm-i = upm (Rjm-i) 

15 

NEXT 



Rj-l,n*X n - Oj-i-N + Ej 7 i,Qj-i « [Rj. 1/ n*Xn/N] 
(8) 

Ej-i"- Ej-i #n -i-Xn-l + Ej- lrn - 2 *Xn-2 + . . , + Ej.^.x 
+ E j-lrO 

D 0/ n-i-1 =00/ n-i-2 « EO, n-i - Bn - B-l - O 



ctW m (Z) : value smaller than 2 m digits of Z 
35 up m.(Z) : value obtained by dividing a value larger 
than 2 m+1 of Z by 2 m+1 

40 With the Algorithm I, the modular multiplication circuit can be realized by the circuit formed as shown in 

Figs.3and7. 

The PE shown in Fig. 3 comprises a m*m bit multiplier for calculating A^^B^ ( , ROMs for respectively trans- 
mitting the value of E± 1vrv ^ from the value of Rj_ 1tn in accordance with Equations (8) and (9), a 4-input adder 
of m bits having a 2-bit carry orn+1 5-input adders, registers of m+2 bits for storing Rj,n- 1 (i = 1 n), registers 

45 for respectively storing A^ h O y ^ and two-stage registers for delaying B fr i, T„. t . The lower m bits of this register 
means the lower m digits (dW m (Rj, ft . $ = Dj, n . [), while the upper 2 bits means a value (up m (Rj, n . i) = S Jtn . 
i) which is larger than m+1 digits of Rj tn .}. As a result, the carry of each adder is absorbed by S jtf> _| at each 
clock. Furthermore S jtfK i is made to be C jlft . t at the right PE, and then is, as a carry, added together with the 
lower m bits of the right register at the right PE. Therefore, the delay time generated due to the calculation of 

so Rj as is carried out in the algorithm H can be eliminated. As described above D >n _ it R^, and C Jin ., show the 
states of the registers and subscript j means a clock. Furthermore, k in the Algorithm I denotes the number 
of the PEs included in one calculating apparatus. Fig. 7 illustrates a calculating apparatus formed by p PEs. 
Symbol h denotes the number of inputs or feedback inputs to the calculating apparatus shown in Fig. 7. 
Fig.8 illustrates the calculating apparatus shown in Fig.7, a memory which receives the output from the 

55 calculating apparatus to feed back it to the calculating circuit shown in Fig.7 and a modular multiplying device 
comprising a control circuit for controlling the aforesaid op rations. The control circuit can easily be formed by 
a counter for counting the clocks and ROM or the like having the address which stores the outputs. 

Then, the operations of the structures shown in Figs 3 f 7 and 8 will now be described. Then, a timing chart * 



NEXT 
where 
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adaptable in a case where n = 4 and p = 2 will now be described. 
#1 
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An assumption is made that the initial state of each register in the structures shown in Fig.3, 7 and 8 is 
0. Furthermore, another assumption is made that A3 is set to No. 1 PE and A 2 is set to No. 2 PE shown in 
Fig. 16. 

When. B is supplied for each m bits in an order BO at first PE (i = 1) from Bin. A3 * B„., (i = 1 4) are 

sequentially transmitted from the multiplier which receives the aforesaid value. For example. A 3 *B 3 is the coef- 
ficient of X6 in terms of the multiplication of a polynomial and as well as it includes the coefficient of X7 since 
the aforesaid output is 2m bits. Therefore, the output from the multiplier divided into upper and lower m digits 
is expressed by ABi (i = 7.—, 4) in Fig.5 because outputs U of the upper m bits are the coefficients of X 7 to 
X 4 . Since outputs Lout of the lower m bits are the coefficients of X 6 to X 3 . it is expressed by ABi (i = 6 . 3). The 
upper m-bit outputs U are supplied to the adders of the same PE, while the lower m-bit outputs Lout are tem- 
porarily delayed by one clock by an external register to be 0^ 1f and then it is added by the adder of the No. 
1 PE Furthermore, the adder adds feedback output Lout supplied from a No. 2 PE to be described later to 
supply the result of this to a register (R lf fr »). At this time, the lower m bits of , (i = 1 , . . . , 4) is, as D 1tB . », 
transmitt d to the next PE. On th other hand, m+1 bits or more, which are the carri s, are transmitted as 
S, 1 to pass through the next PE before it is transmitted to the second PE counted in th forward direction 
asC, Since D 1MV , and C 1tB . 1 are the coefficients of X? to X* and X7 to X s in terms of the coefficient of the 

polynomial, they are expressed by D k (k = 7 4) and Qc (k = 7...,, 5). In the aforesaid chart, other signals are 

as well as expressed by using the coefficient of the polynomial. Furthermore, Eo,n-i expressing the residue 
is 0 and each T^, (i = 1 4) which is the timing of the residu is delay d by 2 docks by the register before 
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ft is transmitted to the next PE. 

When B is similarly supplied at the next PE (j = 2) atj = 1 , A2*B„. ( (i = 1 4) is transmitted from the multiplier 

f r each upper and lower m bits. At this time, the tower m bits are, as Lout, fed back to the No. 1 PE. 

When the result of an addition of D7 and S7 supplied from the No. 1 PE is, as R1,n; stored in a register 
5 P. Then, the value of Ej. 1 obtained from Equation (1) is synchronized with T^i so as to be sequentially trans- 
mitted from the ROM as E 1tn _, to the adder. The result of this is, as R2,n-i. supplied to the register so as to be 
transmitted to the next PE as D k and 

Since p * 2, Dk and Sk are sequentially transmitted from the calculating apparatus shown in Fig.7. Since 
the No. 1 PE is performing the calculations at the time of the commencement of the outputs of D k and S kl the 
10 outputs Dk and Sk are supplied to the memory to delay them. In this state, the calculation of the No. 1 PE is 
completed wtifle delaying the output by one dock. Therefore, the one-clock delay is made in the memory to 
feed back D k , S k , Bn_ ta and T fr! to the calculating apparatus shown in Fig. 16; Simultaneously, A n is set to the 
No. 1 PE shown in Fig. 16 and Ao is set to the No. 2 PE. 

When B is supplied at the next PE (j = 3). A^B^i (i = 1 4) is transmitted from the multiplier for each 

is upper and lower m bits before U, Lout Din and % t are added similarly to the former PE. Furthermore, C 1>n _ 
which is the carry from the second former PE, is added, so that the calculations in the Algorithm I are per- 
formed. Since U, Lout, Din and E M respectively are m bits, the output from the adder is m+2 bits. Therefore, 
the register of R^i must have m+2 bits. If the carry bit is 2 bits, the output from the adder is m + 2 and is 
therefore not changed if it is added to the adder as a carry. 
20 At the next PE (j = 4), an operation similar to that when j = 3 is performed. As a result, the value transmitted 

from the calculation apparatus shown in Fig.7 is R„. 

[Embodiment 4 of Modular Multiplying Circuit Having PE] 

25 The four registers which receive Bin and Tin in the PE shown in Fig.3 act only to delay the inputs Bin and 

Tin by 2 clocks. Therefore, the overall size of the circuit Scan be reduced by arranging the structure in such 
a manner that four output registers which concern B and T are omitted from the PE shown in Fig.3 and the 
values of B and T are stored in different shift registers as shown in Fig.9 to sequentially supply them to each 
PE at every three registers. 

30 

[Embodiment 5 of Modular Multiplying Circuit Having PE] 

It is apparent that operations can be performed at high speed by longitudinally connecting a plurality of 
LSIs in the calculating apparatus shown in Fig.7 and formed by the LSIs. In a case where the high speed op- 
. 3$ eration is realized by using q USs, it corresponds to multiply the value of p by q times in the Algorithm 1. A 
modular multiplication is performed by using two LSIs is shown in Fig. 10. 

[Embodiment 6 of Modular Multiplying Circuit Having PE] 

40 The present invention can be used in a modular multiplication on a Galois field as well as a modular mul- 

tiplication on the integer f ield. In this case, the structure of the PE shown in Fig.3 must be changed to that 
shown in Fig.11. 

Since there is no carry on the Galois field, signals denoting Cin, Cout, Sin and Sout and signals denoting 
Lout and Lin can be omitted and thereby the structure can significantly be simplified. 
45 . Therefore, Bin and Bout, Tin and Tout, and Din and Dout of the PE shown in Fig. 20 are longitudinally con- 
nected, so that the residue multiplication on a Galois field can be performed similarly to the aforesaid residue 
multiplication of integers. 

Although E is added as an alternative to - Q.N in the residue calculation according to this embodiment, 
the modular multiplication circuit of this system may be constituted by a conventional system in which a cal- 
50 culatton -Q.N is performed. 

[Structure of RSA Cryptographic Apparatus Having PE] 

The following methods of raising the processing speed in the RSA cryptographic apparatus are known: 
55 Encryption: the value of the cryptographic key e is made smallest (the minimum is three) 

Decryption: the speed is raised by employing Chinese R minder Theorem 
In a case where the RSA cryptographic apparatus is constitut d on the basis of this method, the conven- 
tional" modular multiplication circuit encounters a problem in that the cryptography and the decryptogrpahy 
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cannot easily be executed by the same modular multiplication circuit because the number of digits of the mul- 
tiplier and the divisor are different between th encryption and the d cryption. Therefore, the modular multi- 
plication is performed by a full software means or different circuits. 

However, ian advantage can be obtained from the modular multiplication circuit according to the present 
invention in that the encryption and the decryption can easily be realized by the same circuit because the 
number of the digits of the multiplier and that of the divisor are determined on the basis of the number of the 
operations as an alternative to the size of the circuit The aforesaid number of the operations can easily be 
realized by changing the control performed in the control circuit because the number of the feedbacks to the 
calculating apparatus shown in Fig. 13 is different in the case of the encryption and the decryption. 

Furthermore, the calculation of the RSA cryptography to be performed on the basis of the Chinese Re- 
minder Theorem can basically be executed in parallel. Therefore, it is most suitable for use in the method ac- 
cording to the present invention in which the residue multiplication is executed by a plurality of calculating ap- 
paratus. 

[Embodiment 7 of Modular Multiplying Circuit Having PE] 

Then, a residue multiplication of R = R - B mod N (where A is a k-bit integer and B is an m-n-bit integ r) 
wfll be considered. A divided for each bit and B divided for each S m bits can be expressed as follows: 
A= A k . , * 2'- * + Ak. 2 * 2'- 2 + .» * 2 + A<> (10) 
B « B„. , * X«- ' + B„. 2 * X"- 2 + ... B 1 * X + B 0 (11) 
Assumptions are made that X = 2m and the bit series of Aand B obtained by dividing from the upper digit 

respectively are A*. , (i * k k), , (i = 1 n). In this case, It has been known that the residue multiplication 

can be performed by repeatedly subjecting j 

= 1 k to the following calculation: 

R = R»2 + A k .j*B-Q*N (12) 
where Q = [R/N] and the initial value of R is 0 

In order to realize this calculation by a systolic array, the aforesaid calculation is expressed by the following 
algorithm: 



Algorithm J 

D Or n-i-1 = Co, n-i-1 =0 
FOR j = 1 TO k 
FOR i — 1 TO n 

Rjr n-i - 2 * Dj-i, n _i + Cj- 2 f n-i + Ajc-j * Bn-i + Ej, n-1 
Dj/n-i - dW m .! (R jfn .i) 
s j/ n-i - up m - x (Rj, n -i) 
c j-lr n-i = Sj-i, n -i 
NEXT 

NEXT 
where 
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S j-i, „-!* X* = Qj-i* N + Ej-i, (13) 
E j-i = Ej-i^-i-X^- 1 + + Ej-i,i*X + Ej-i /0 . (14) 

m-i (Z) : value smaller than 2 m ~ 1 digit of Z 
up m-l (2) : value obtained by dividing a value larger 
than 2 m of Z by 2 m 

15 , The aforesaid algorithm can be formed by circuits shown in Figs. 12 and 13. The circuit shown in Fig. 13 is 
formed into a systolic array. The systolic array performs the calculation by a pipeline processing by so-called 
PEs which are small and same functional blocks. The PE is formed as shown in Fig. 12. 

In Algorithm J t symbol i denotes a clock, j denotes the number in Fig.22 which indicates the position of 
the PE disposed at j = 1 (No. 1) to j = n (No. n) from right to left It is assumed that each PE shown in Pig. 

20 22 has a value of Afc.j 0 = V- k) In the internal register thereof. The No. 1 PE receives B^ (I =1 n) at the 

Bin thereof sequentially starting from the upper digit Furthermore, it receives timing signal (i = ... t n) for 
the residue output from Tin in response to the aforesaid receipt of x (I =1 n). They are respectively trans- 
mitted to the next PE through the corresponding Bout and Tout after are delayed by the registers. The structure 
is arranged in such a manner that 0 is set of Din, Sin andCin of the No. 1 PE. The elements and the operation 

25 of the PE will be described. 

(1) Multiplying Portion 

The multiplying portion of A k . j * B n . f of each BE can easily be realized by m ANDs each transmitting 
do Bp- 1 only when Afc.j = 1 because Afc.j is 1 bit 

(2) Arithmetic Portion 

The adder transmits a carry of 2 bits because it is formed by a 4-input adder which receives output A*. 
35 j*Bn-|from the multiplier, residue output Ej . 1tft . 2 * Dj _ 1tn . , and Cj . 2 .n- i- Therefore, each register which 
receives the output from the adder may be formed by a m+2 bit register. Furthermore, the values smaller than 
m-1 bits of the register , of the j-th PE are, as D jirt _ lf transmitted to the next PE, while the values larger 
than m bits are, as Cj tn .|, transmitted to the same. However, 2 * Dj . 1tn . r can be realized by supplying Dy 1tn _ 
1 to the adder by shifting it by one bit 



40 
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( 3 ) Residue Portion 



In order to simplify the description, Qj. -i of values Sj_ 1tn _ 1t which are larger than N in terms of digits in place 
of the value, are obtained. Furthermore, - Sj. 1rR . v X n + Ej. n is executed in place of executing - Q,. , *N, 

45 so that the residue calculation is performed. The reason for this lies in that Sj. i, n . ^ * X n = Qj _ ^ * N + Ej. 
i (Ej. t < N). Since - Sj . 1f0 . , * X n is automatically performed due to the overflow of Sj. ltn . 1f the residue 
calculation can be completed only by adding The addition of E^ is performed by the following method: 
since the digit of Bn-i and that of Ej. 1vn . 1 are the same in an equation which expresses Ej_ 1 at the j-th PE, E h 
i.n- 1 (> - 1 n) are sequentially transmitted at timing signal T^ f synchronized with Bn. ( . Since S h 1tft _ 1 is a 

so three bit number and T„. ( is a value denoting n-i, the Ej in .i output circuit can be realized by a ROM having an 
input of 3 + log (n - i) bits. Furthermore, a m+3 bit register and a selector for receiving and holding Sj.i, n . 1 
must be provided. 



(4) Delaying Portion 

It is formed by a register for transmitting the value of B^j and T^j in a pipe line manner. A register of m 
bits and log (n - i) bits must be provided for B^i and T„.|. 

Then, a timing chart which illustrates the operation of the circuit shown in Fig. 13 in a case where an as- 
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sumption is made that k = n = m = 4. 
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50 The initial state of each register shown in Figs.12 and 13 is made to be 0. 

When B is sequentially supplied for each m bite in an order B 3 B 0 through Bin at the first PE (j = 1), a 

3*B r> . i (i = 1 4) are sequentially transmitted from the multiplier which receives t he aforesaid input Expressing 

the values of these outputs by Dx, D15, D11, D7 and D3 are transmitted from the No. 1 PE. Then, the meaning 
of Dx will now be considered. Since A can be decomposed for each bit aj denotes the j+1 th bit of A. Further- 

55 more, B can be decomposed for each 4 bits from m = 4, Bi denotes bits b4(i + 1) - 1 to b41 from the 4 - (i +1) 
th digit of B to 4i + 1 th digit Therefore, aj * Bi denotes the bit from the 4 - (i + 1) + j th digit to 4i + j + 1 th 
digit so that Dx is expressed by D4i + j in terms of the digit The D4i + j is compos d of 4 bits d 4 0 + t) + j . t to 
d4j ♦ j. 
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At the next PE (j = 2), when 6 is supplied through Bin after one clock delay has been made, a2*B D . t (i = 
4,;.., 1) s transmitted from the multiplier. Since first input D15 has been supplied from Din, residues E14 to E2 
are transmitted in accordance with Equation (11) in response to timing signal Tn-i starting from the uppermost 
digit d18. Also Ei is ah output for each 4 bits. By adding the output denoting the multiplication, the output de- 
5 noting the residue and input Din, the outputs of D14 to D2 and S18 to S6 are transmitted from the register. In 
this state, S18 to S6 are carries generated by the additions and are 2-bit values. 

At the next PE (j = 3), a calculation similar to that in the j -2 PE is performed in such a manner that the 
residue is generated from S18, which is the uppermos digit, and d17, which is the digit one bit lower, in accor- 
dance with Equation (10). 

10 At the final PE (j = 4), a calculation similar to tha in the j = 3 PE is performed in such a manner that the 
carry from the No. 2 PE is further added. The addition at No. 4 PE is, as shown in the first E1 2, from the 1 6- 
th digit to the 13-digiL Since the first carry from the No. 2 PE is a 2-bit carry S14, the addition is the 16-th 
digit and the 15-th digit Therefore, the additions can be performed by the same adder. The output from the 
PE is the result of the residue multiplication. Furthermore, one PE shown in Fig.12 is able to calculate Ax-j in 

15 Equation (12). 

[Embodiment 8 of Modular Multiplying Circuit Having PE] 

In the systolic array, the operations to be performed in all of the PEs are the same and the input/output 
20 relationships between PEs are the same. Therefore, the systolic array is formed into an architecture in which 
the time sharing system in the same circuit can easily be performed. 

A most simple structure can be realized by one PE shown in Fig.12 and a memory. Fig. 14 shows it which 
is operated as follows: 

(1) First, A k . t is set to the PE, and |, T^) 

25 (i = 1 n) are sequentially supplied to the PE. Since one PE performs the calculation of Equation 

(12) as described above, its output R = Afc.vB is supplied to the memory. Furthermore, setting to PE is 
changed to A»t_2 immediately after the input of B^,, T,^ (i = 1 n) has been completed. 

(2) R, which is the result of the previous calculation, fed back to the PE, and as well as B^, are re- 
peatedly supplied. As a result, R « R -X + A* . 2 *B - Q-N is transmitted form the PE, this output is again 

30 stored in the memory. 

(3) The setting to the PE is changed to A k .j (j = 3 k) and the operation of (2) is repeated. 

Therefore, it can be said that the modular multiplication can be executed by the circuit shown in Fig. 14 in 

such a manner that the calculation to be performed by K PEs by one calculation is performed by using one 
PE by k times. If the structure is arranged in such a manner that p- pieces of PEs are connected in a pipeline 

35 manner and A*, j to A^.j+p. t are continuously set, the modular multiplication can be executed by repeating the 
calculation by p-pieces of PEs by k/p times. It can be considered that the size of the circuit (p-pieces of PEs) 
is traded off by the processing speed (k/p times of calculations). As described above, in the circuit formed by 
the systolic array, the size of the circuit and the processing speed can be easily traded off and thereby the 
size of the circuit can be reduced. 

40 As described above, the modular multiplication circuit can be formed by the systolic array according to 

the aforesaid embodiment if only the residue multiplication is required. Then, the structure of the RSA cryp- 
tographic apparatus wQI now be described. The modular multiplication circuit according to the aforesaid em- 
bodiment, one time, performs the modular multiplication. The result of the residue multiplication obtainable 
from this circuit includes a 2-bit carry bit for each m bits. In a case where the residue multiplication is repeated 

45 by using the result of the residue multiplication, the residue multiplication cannot be executed by the same 
circuit if the carry bit is not corrected. Therefore, in a case where the modular multiplication is repeated by 
using the previous result of the residue multiplication in the case of t he RSA cryptography, it is a critical factor 
that the previous result of the modular multiplication must be easily and efficiently corrected. 

The series Ac and Be each having a carry bit and divided into A and B shown in Equations (10) and (11) 

50 and carry bit series a and b can be expressed as follows: 



55 
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Ac = A + a*X 

a - a + ... + a 2 * ffi *X + a m 

Be = B+b*X 

b = b n -l*Xn-2 4 # _ + b2 * x + bi 

Therefore, the residue multiplication Rc = Ac - Be mod N for Ac and Be is expressed as follows: 
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[Embodiment 1 of Modular Multiplying Circuit for RSA cryptographic apparatus] 

D 0/ n-i-1 « 0 ; C 0 , n-i-1 " 0 
FOR s = 1 TO k/m 
FOR C = 1 TO m 
FOR i = 1 TO n 
j=(s - 1) -m + c 

* R j, n-i = 2*dw m -! (Rj-i, n -i> + Cj_ 2 ,n-i 
E + A k-j- (B n -i + b n -i) + Ej.jj-i 

A C j-i, n -.i = upm-x (Rj-i, n -i> 
NEXT 

NEXT 

FOR ± — 1 TO n 

p Rjrn-i - dw n {Rj_ 1/n «i) + Cj-2/n-i 



E « a ^-^(Bn-i + b n _i) + Ej, n _i 



'j 

55 B c j-l/n-i =up m (Rj-i, n -i) 

NEXT 
NEXT 

40 

FOR i » 1 TO n 

* R k, n-i = Rk/ n-i + Ck-i, n-i 
45 E R k+l. n-i = dW m (R k , n . ± ) + up m (R k , n-i-1 ) 

c R k+2, n-i = dW m (Rk+l/n-i) 

so + U Pm (Rk+1/ n-i-1 ) + Ek+l/n-i 

NEXT 

The process of each of PEA, PEB and PEC shown in the aforesaid algorithm can be realized by the PEs 
55 shown in Figs. 15 to 17. Each PE is operated as follows: 

PEC: A PE as shown in Fig. 16 is inserted into the final portion of the modular multiplication and the 
carry output from this PE is made 1 bit The PE shown in Fig. 16 first adds outputs Dout, Sout and Cout from 
the former PE to obtain a value t . Then, value C ktn .j of Rj^ 1 which is larger than m+1 bits is delayed by 

23 



EP 0 531 158 A2 



the register to be added to a value D kfn _i which is smaller than m bits. As a result, carry C^.^ lf which is the 
result of the aforesaid addition, is made to be 1 bit However, the carry 1 of the uppermost digit is stored 
in a different register. Then, residue E^ Unr , of the uppermost digit is calculated from C k , n . , + C k ♦ 1fn . 1( 
and then Dn + i, n - 1 + Qc + 1(n . 2 + E k ♦ i.n - 1 which is the uppermost digit of the modular multiplication, is 
previously calculated. In the case where the carry is generated in this uppermost digit, a residue of 1 + C ktn . 
1 + O k + 1tn . t is transmitted to the calculation of R^,,^ j which is the final result to discriminate/control in such 
a manner that there is not carry bit for the uppermost digit A discrimination circuit for use in this operation is 
realized by a 3-bit ROM and an adder. 

PEA: The input thorough Bin is converted from B „_ , into B n . , + b n . j in order to correct the carry for 
B. Therefore, and b„_ , are simultaneously supplied to Bin as shown in Fig. 14 to calculate AND with A^t. 
Therefore, the number of AND circuits for the multiplying portion of the PE is m + 1. The digit of the output 
denoting the AND A„ . j * b n . , is the same as the digit of the lowermost bit of the output denoting AND A,, 
j * B n . ». 

PEB: In order to correct the carry for A, one PEB is inserted for each m pieces of PEA. Carry bit a H 
for A is set to PEB. Since the digit of ak-j is the same as that of A^j, which has been set to the former PE, the 
PEB performs the calculation R = R + a k . J *B-Q*Nin which there is no carry in place of performing 
the operation shown in the equation. Therefore, output Rj. i-n; i from the former PE must be processed in such 
a manner that outputs smaller than m bits are, as Dj. 1in . h received, through Din and outputs larger than m+1 
bits are, as S y 1fn . received through S'in. Since carry a k . t = 0 from PEC to the upper most digit, the PEB 
for the carry for the uppermost digit can be omitted. 

Therefore, the RSA cryptography apparatus can be realized by a systolic modular multiplication circuit 
structured as shown in Fig. 18. Fig.18 Illustrates a structure having one PEB for each m pieces of PEA and 
one PEC is used in place of PEB in the final portion of the residue multiplication. As a result, If an output from 
the structure shown in Fig. 17 is supplied to a circuit structured similarly to the systolic array shown in Fig.18 
a similar residue multiplication can be executed. 

When the size of the systolic array is desired to be reduced, a circuit having one PE shown in Fig.19 in 
which the functions of PEA to PEC including PE shown in Fig.14 is switched by a selector can be employed, 
resulting a similarly-small modular multiplication circuit Since the functions of PEA to PEC are similar to one 
another, a major portion of the circuits can be commonly used. Therefore the size of the circuit for the PE shown 
in Fig. 28 can be reduced.: 

[Embodiment 2 of Modular Multiplying Circuit for RSA Cryptography Apparatus] 

Encryption: the value of the encryption key e is made smallest 
Decryption: the speed is raised by employing Chinese Reminder Theorem 

In a case where the RSA cryptographic apparatus is constituted on the basis of this method, the conven- 
tional modular multiplication circuit encounters a problem in that the encryption and the decryption cannot 
easily be executed by the same modular multiplication circuit because the multiplier and the number of digits 
of the divisor are different between the cryptography and the decryptography. Therefore, the modular multi- 
plication is performed by a full software means or different circuits. However, the modular multiplying method 
according to the present invention enables trading off to easily be performed by the size of the circuit and the 
number of operations. Therefore, the difference in the digit of the multipliers and that of the divisor can be 
overcome by changing the number of the operations, and thereby the cryptography and the decryptography 
can easily be realized by the same circuit 

Furthermore, the calculation of the RSA cryptograph to be performed on the basis of the Chinese Re- 
minder Theorem can basically be executed in parallel. Therefore, it is most suitable for use in the method ac- 
cording to the present invention in which the modular multiplication is executed by a plurality of calculating 
apparatus. 

As described above, the modular multiplication circuit and the RSA cryptographic apparatus can be effi- 
ciently formed. 

As is shown in the case where the size of the systolic array is reduced by means of the modular multiplying 
method according to the present invention, the modular multiplication circuit can be formed by p (an arbitrary 
numb r) pieces of PEs. Th refore, the method according to the present invention exhibits a characteristics 
that the structure can easily be formed into a circuit or a gate array. As a result by collecting one to plural 
PEs into a chip (hereinafter called an "SRC (Systolic RSA)" before the chip is combined with an RAM such 
that it can be controlled with a program, an RSA cryptographic apparatus can be easily realized. The external 
program control can be flexibly provided by means of an ROM. 

In a case where a high spe d operation is required, a plurality of SRCs longitudinally conn cted are used * 
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assh wn in Fig.20 so that the number of the PEs can be increased. Then, the programming is switched over, 
so that the high speed operation can be easily realized. Therefore, the modular multiplying method according 
to the present invention can be adapted in a multiprocessing method. 

Consequently, the RSA cryptography apparatus according to the present invention will cause the following 

5 effects to be obtained: 

Since the calculation to be performed in one PE are simple integer calculations, the modular multiplying 
algorithm according to the present invention can be formed into an apparatus by CPU or DSP. Therefore, in 
a case of a cryptographic system is used in low transmission speed systems such as IC cards or telephone 
lines, a simple RSA cryptography apparatus can be realized by using the CPU or DSP. 

10 The modular multiplication circuit according to the present invention can be formed by an arbitrary number 

of PEs. Hence, a high speed RSA cryptography process can be realized by one chip by using a C-MOS gate 
array of about 20K-gate or smaller which can be inexpensively produced by the present semiconductor tech- 
nology. 

Since the multiprocessing process for the RSA cryptography by means of a plurality of chips can easily 
15 be realized, the processing speed can easily be raised in proportion to the number of chips. 

Even if the number of digits on the input value in the modular multiplication is excessively large, the nec- 
essity lies in only increasing the number of the PEs, that is, the number of chips. Therefore, a satisfactory 
expansion performance can be realized. 

When the RSA cryptography process is performed in a case where the number of digits is different be- 
20 tween the cryptography and the decryptography, the size of the circuit and the number of the operations can 
easily be trade off in the modular multiplying method according to the present invention. Therefore, the de- 
cryptography and the cryptography can be easily be realized by the same circuit by changing the number of 
operations even If there is a difference in the number of the digits of the multiplier or the divisor. Therefore, a 
satisfactory RSA cryptography apparatus can be constituted. 
25 Since the residue is performed in such a manner that E is obtained by means of the ROM in accordance 

with Equation (14) in order to simplify the structure, it is apparent that a high speed modular multiplication can 
be executed by a small circuit 

As described above, the calculating apparatus according to the present invention exhibits an effect to be 
obtained in that the modular multiplication circuit can be efficiently constituted by the systolic array. 
30 The systolic array performs the modular multiplication in such a manner that a multiplication of large digit 

is decomposed into small digits (m bits) for each PE while dispensing with a discrimination whether or not R 
< N. Therefore, only the time taken for the signal to pass through a multiplying or a dividing ROM is required 
to process one clock. Therefore, a high speed pipe line process can be performed. 

Furthermore, since the systolic array can be realized by a regular structure composed of the same simple 
35 PEs, a large scale circuit such as the VLSI can easOy be constituted. In addition, the same control can be adapt- 
ed to each PE and data is operated while being synchronized by the same clock, so that the systolic array 
can be easily realized. 

Since the calculating apparatus comprising a plurality of PEs is free from a limitation present on the number 
of the PEs, the size of the circuit can be freely determined and thereby it can easily be formed into an LSI 
40 apparatus. Furthermore, the calculating apparatus according to the present invention can be realized by a 
regular structure composed of the same simple PEs. Hence, the VLSI can easily be employed. In addition, 
the same control can be adapted to each PE and data is operated while being synchronized by the same clock, . 
so that the structure can easily constituted. In addition, even if the number of the digits of A and B are large 
or the processing speed is desired to be further raised, the necessity lies in only adding the PEs or the cai- 
45 culating apparatus. Therefore, a satisfactory expansion performance can be realized. 

Since the calculation to be performed in the PE is a simple integer calculation, it can easily be realized 
by a microprocessor or a digital signal processor. 

If m is increased, the size of the circuit is enlarged and the processing speed is raised. Therefore, the size 
of the circuit and the processing speed can be selected on the basis of the value of m. Furthermore, trading 
so off with the processing speed can be easily performed. Therefore, an efficient modular multiplication circuit 
can be provided. 

According to the present invention, an effect can be obtained in that the encryption/decryption apparatus 
for performing communication by means of the cryptography can be realized by a small circuit size. 

55 [Modular Multiplication Using Montgomery Method] 

A description will now be giv n of a method of conducting modular multiplication which employs N as the 
modulus. As an example of such modular multiplication, the description refers to a method proposed by Mon- 
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tgomery (Montgomery method) which conducts modular multiplication in an integer R which is prime to N. The 
description will begin with an explanation of a cryptosystem which employs modular exponentiation and mod- 
ular multiplication. Then, a description will follow of processes performed b fore and after the modular expo- 
nentiation and modular multiplication which employ the Montgomery method, as well as of matching of input 
and output in the modular multiplication employing Montgomery method. Further, a description will be made 
as to PE which executes the Montgomery method and also as .to a circuit which efficiently executes modular 
exponentiation and modular multiplication by using a plurality of such PEs in parallel. 
The following theorem was introduced by Montgomery: 

Theorem 1: 

The condition of the following formula (21) is met on condition of M = T-N' mod R, where N and R are 
integers which are mutually prime, T is an optional integer and N' is given by N' = -NM mod R: 

(T + M.N)/R = T.R- 1 mod N (21) 

Method of proof: Neglected 

Therefore, the modular multiplication Q = A-B mod N can be executed as follows by using an integer R 
which is prime to N: 

Ar = A R mod N (22) 
B R ,= B*R mod N (23) 
T = Ar-Br (24) 
T R = T R* 1 mod N = (T + M-N)/R (25) 
q = Tr R- 1 mod N (26) 

The computations of the formulae (24) and (25) inclusive will be referred to as Montgomery modular mul- 
tiplication. The Montgomery modular multiplication can be expressed as follows: 

T R * Ar-Br-R*" 1 mod N 

= (Ar-Br + M-N)/R (27) 

where 

M *= Ar-Br-N' mod R (28) 
In executing Montgomery modular multiplication, R is an integer prime to N on condition that R is deter- 
mined to be 2" (n being an optional integer). In this case, the division by R can simply be performed by a bit- 
shift operation, so that the Montgomery modular multiplication of the formula (25) or (27) is executed simply 
by multiplication alone. 

Processings before and after the computation of each of the formulae (22), (23) and (26) also can be exe- 
cuted as follows by Montgomery modular multiplication as follows. 

Ar = A R mod N = ARrR- 1 mod N 
Br = B*R mod N = B.Rr-R* 1 mod N 
Q = T R .R_ 1 mod N = T R -1R* 1 mod N 

wherein R R = R 2 mod N 

R R is a value which is definitely determined by N. When N is given, R R is determined and can be treated 
as a constant Therefore, computations of the formulae (22) to (26) are commonly executed by using a com- 
puting circuit which performs 2 = X-Y-R- 1 mod N, thus enabling computation of the modular multiplication Q 
= A B mod N to be determined, as shown in Fig. 21. Fig. 21 shows that the outputs Ar, B r , T r and Q are 
respectively obtained in response to a set of inputs (A, Rr), (B, Br), (Ar, BR) and (T R , 1) . 

[Montgomery Modular Exponentiation 1] 

Modular exponentiation C = M e mod N also can be conducted as follows by using Montgomery method. 
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Input M, e, N, Rr 

Mr = M-Rr-R- 1 mod N (29) 

5 

C R « 1-Rr.R* 1 mod- N (30) 

For i = t to 1 

10 If e 1 « 1 then C R « Cr-Mr-R" 1 mod N (31) 

If i > 1 . then C R « Cr-Cr-R" 1 mod N (32) 



15 Ne*t 

C « Cr-1-R" 1 mod N (33) 

20 It Is thus possible to carry out modular exponentiation only by Montgomery modular multiplication. The 

initial value of Cr in formula (30) can be treated as a constantwhich is determined by Rr and N. The described 
modular exponentiation conducted through Montgomery modular multiplication alone will be referred to a Mon- 
tgomery modular exponentiation. 

In the execution of Montgomery modular exponentiation, one computation result is used as the input for 

25 the subsequent cycle of computation, thus repeating multiplication. Execution of the repeated multiplication 
by a single circuit is difficult to conduct when the greatest bit number of the output exceeds the greatest bit 
number allowed for the input 

The inventors therefore sought for any condition for equalizing the greatest bit numbers of the input and 
output in Montgomery modular multiplication of the formula (27). 

Theorem 2: 

In formulae (27) and (28), when conditions Ar < 2" + u , B R < 2 n + u , N < 2 n and R = 2 n + r are met, the suf- 
ficient condition for meeting the requirement of T R < 2 ft + u is either U = 1 with r > 1 or u > 1 with r = u + 1. 

55 

Means of proof: 

When R = 2 n * M < 2" ♦ r is derived from formula (28). 

When conditions are Ar < 2" + u , Br < 2 n + « and N < 2 n , conditions Ar-B r < 2 2 < n + u > and M-N < 2** ♦ ' are 
40 met 

The following condition is obtained when carry up is taken into consideration. 

Ar-Br + M-N < max <2«" ♦«> ♦ \ 2* * r * 1 ) 
Therefore, T R < max (Z 2 ^^ 1 . 2 2n * r * 1 ) 

Consequently, when condition 2 n * 2u * 1 - T ^2^ is met, a condition T R < 2 n * 1 is established. 
45 " Therefore, u = 1. r>1 (34) 

Conversely, when 2" * ♦ 1 * r > 2 n ♦ 1 is met a condition T R < 2 n ♦ 2u * 1 - r is met 

Therefore, u > 1, r = u + 1 (35) 
In the foregoing explanation, max (A, B) indicates a function which selects the bigger one of A and B. 
When either the condition of the formula (34) or the condition of the formula (35) is met, the Montgomery 
so modular exponentiation can be realized by a simple repetition of Montgomery modular multiplication. It is there- 
fore possible to execute the modular exponentiation, simply by selecting inputs to formulae (29) to (33) by se- 
lectors S as shown in Fig. 3. 

In the circuit shown in Fig. 3, each of the selectors S has, as selectable feedback inputs, Cr in one hand 
and on the other hand a memory for temporarily storing C R and M R . Obviously, the arrangement may be such 
55 that a single memory which temporarily stores C R and M R is provided on the input side of two selectors S so 
as to b use commonly by these sel ctors S. 

The switching of the input to such a s lector S may employ a shift register adapted for storing e and for 
successively outputting ei from the significant bit and a control unit which, upon receipt of the output from the * 
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. shift register, determination as to whether e, = 1 and i > 1 and delivers a switching signal in accordance with 
the result of the determination. 

in this case, if the requirements of the formulae (34) and (35) are met, Montgomery modular xponentiation 
can wholly be accomplished by a simple repetition of Montgomery modular multiplication. However, since u > 
5 0 is derived from the formulae (34) and (35), it is necessary t hat at least C as the computation result is corrected 
to satisfy the condition of C < N. 

The known method proposed by Even requires that such a correction be done each time the Montgomery 
modular multiplication is executed. In contrast, the method of the invention requires only one correction which 
is conducted after completion of the modular exponentiation. This correction is a very simple processing and, 
10 therefore, does not significantly affect the scale and the processing speed of the circuit which is employed in 
the following Montgomery modular exponentiation which wQI be described hereinunder. 

[Montgomery Modular Exponentiation 2] 

15 The modular exponentiation C = M e mod n also can be conducted by the following procedure. 

Input M, e, N, RR 

MR = M-Rr-R-1 mod N 

CR » i-r r .r-1 mod N 
For i = 1 to t 

If ei = 1 then Cr « Cr-Mr-R' 1 mod N 
If i < t then M R = Mr-Mr-R" 1 mod N 
Next 

C = Cr-I-R-1 mod N 
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35 It is dear also in this case that C can be computed by a simple repetition of the Montgomery modular mul- 

tiplication by using formulae (34) and (35). It is also clear that modular exponentiation can be executed similarly 
by arranging such that, in the circuit shown in Fig. 3, two selectors S can independently select C R and M R and 
that both selectors can commonly select M R . 

It is thus understood that both modular exponentiation and modular multiplication can be executed by us- 
40 ing only one circuit which conducts computation of the following formula (36). 

Z = X Y R-1 mod N (36) 
It has also been proved that the formula (36) can be computed by a simple repetition of the Montgomery 
modular multiplication shown in the formula (27), on condition that the input values satisfy the requirements 
of the formulae (34) and (35). 
45 Furthermore, computation the formula (36) ortheformula (37) can be realized by various practical means, 

since such computation handles only integers. For instance, a CPU or the like device may conveniently be 
employed in such computation. 

It is therefore possible to construct various cryptosystems employing modular multiplication and modular 
exponentiation, by using a common computing circuit and computing process which execute computation of 
so the formula (36) or (37). 

[Embodiment 1 of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

A description will be given of modular multiplication of T R = A R -B R -R~ 1 mod N (Ar, B r < TP + u t R = 2 n * r , 
55 N < 2 n integer, u and r meet the requirements of formulae (34) and (35)). A R can be expressed as follows when 
divided for every v bits. B R , Nl and T R also can be expressed as follows when thy are respectively divided for 
every d bits. It is to be noted, howev r, that the following conditions must be satisfied. 

n + r ^ md, n + r ^ k-v, X = 2 d and Y = 2? (v ^ d) 
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Ar - Ax-i-Y*" 1 + A*-2-* k ~ 2 . + + A x -Y + A 0 

B R « + Bjb-z-XP^Z + + B r X + B 0 

5 N - N^x-xn*-! + Nm . 2 .x»"2 + + Nl . x + N() 

T R = Txn-i-X*-! + T^-X^^ + + Tl . x + X() (3?) 

10 In the formula (37) shown above, Aj (i = 0 k-1 , 

Aj = 0 when n + u < i) represents a bit series which is obtained by dividing Ar at every v bits starting 

from the least signif icant bit Similarly, B h Nj and Tj Q = 0 m-1) represent bit serieses which are formed by 

dividing Br, N and T R at every d bits, respectively, from the least significant bits. In this case, the Montgomery 
modular multiplication is determined by executing the following computation from i = 0 to i = k. In the following 
15 formula, T-i indicates the value of T R in the hth cycle of computation, unlike T, appearing in the formula (36). 

T-i = (T. 1 + AtBr-Y + M,. r N)/Y (38) 
wherein M, . 1 = (T. , . , mod Y).No' mod Y and T . _ j = 0, N D ' = N' mod Y. 
In order to execute this computation by parallel processing, Br and N are represented as follows by using 
Bj and Nj. 

20 

Algorithm L: 
For i = 0 to k 

25 

M i-1 « dw v (dw v (Ti-i #0 ) .No f ) 
For j = 0 to m - 1 

30 R i, j - T i-l,j + ^i-2, j+i- X/Y2 + Y.Ai.Bj + Mi-!-Nj 

Li,j = dw v (Ri, j) 

35 

Next 

Next 

m wherein dw d (Z) = Z mod 2 d 

up d (Z) « (Z-dw d (Z) )/2 d 

All the initial values of Ty and Ly are zero. 
45 In the Algorithm U multiplication and division by the constants X = 2* and Y = 2*, in the terms such as 

Y-Ai-Bj. L|. 2j ♦ rXfY 2 and T y = (dw„ (Ry) - Ly)/Y are realized by shifting bits with respect to other values. 
Thus, the computation with respect to Ty means.that Ty is the value of v-th bit to (d+v-1 )-th bit of Ty as counted 
from LSB. It is to be noted, however, Ly is the value down to (v-1)-th bit from LSB of Ry. Thus, division by Y, 
i.e., computation of 1 /Y, is realized by a bit shift for every Ry towards the LSB. Therefore, U 2j+i is used when 
50 computing Ry, with multiplication by X/Y 2 for obtaining figure matching. 

Fig. 23 shows a circuit which executes the algorithm L In this algorithm, i represents clocks, while j cor- 
responds to the positions of the registers R in Fig. 23. Thus, the register n the right end as vi wed in Fig. 23 
is expressed as R uo , while the register on the left end is R^ v 

A description will be given of the construction and operation of the circuit shown in Fig. 2. The following 
55 description assumes the case of v = 1 for the purpose of simplification of explanation. Referring to Fig. 2, Bj, 
Nj(j = 0 m-1) and N 0 ' are respectively d-bit multipliers each having a multiplicator stored therein. Each mul- 
tiplier is therefore realized by d pieces of AMD. Wh n N is an odd number, the multiplier for computing M M 
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can be dispensed with/because in such a case the condition is N 0 ' = 1. In this case, LSB of TV li0 is outputted. 
Input to and output from the adder, Indicated by +, are as follows. The utput M| . r Nj from the lower-place 
multiplier has d bits and so does the output Af-Bj of the multiplier of the upper place. In order to double the 
value of the output Aj*B J( this output is input with a one-bit shift to upper place with respect to the output M, . 

5 rNj. The input T k tJ from the register is inputted after the bits of Rnj starting from the second bit as counted 
from LSB are shifted by one bit toward lower place, so as to obtain figure matching with Mj . r Nj. L| . 2j ♦ 1*2** * 
2 means that the one-bit output U ^ from PE which ahead of the PE which is immediately ahead of the instant 
PE is inputted to the (d-1)-th bit of M M -Nj as counted from LSB. In this case, (d+3)-bit output is derived from 
the adder, on condition of T ( . 1d < 2* + 2 . Thus, the registers which receive outputs from the adders are (d+3)- 

10 bit registers. 

It is thus possible to execute the computation of formula (38) by the circuit shown in Fig. 23. Namely, Mon- 
tgomery modular multiplication can be executed by inputting numbers Ao to A*. 

The foregoing description taken in conjunction with Fig. 23 is based on an assumption of v = 1. However, 
it will be clear that Montgomery modular multiplication can be executed by the same procedure for any value 
15 of v which meets the condition of v ^ d. 

The described Embodiment 1 of Montgomery modular multiplication circuit performs a high-speed proc- 
essing with a circuit of a very small scale. 
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[Embodiment 2 of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

In order to realize these computations by means of a systolic array, Br and N are expressed by using Bj 
and Nj as follows. 

Algorithm M: 



For i « 0 to. k 

Mi_i = dw v <dw v (*i_i #0 > .N 0 f ) 
For j = 0 to m - 1 

R i,j " T i-l,j + Ci,j_i + Li-2, j+l'X/Y 2 + Y-Ai-Bj + Mi^-Nj 
L i#j - dw v (Ri,j) 
Ti,j = (dw d+v (Ri,j) - Li,j)/Y 
C i,j = up d + v <Ri,j) 
40 Next 
Next 

wherein dw d (Z) « 2 mod 2d 

45 

up d (2) = (Z - dw d (Z))/2d 

initial values of T i#j/ C± rj and L i# j are all 

so zero. 

In the algorithm M shown above, C u . 1 is used when Rg is computed as a carry. Computations containing 
X and Y as constants, such as YA,.Bj, ^ . y ♦ rX/Y 2 and T y = (dw d + v (Ry) - Ly)/Y are realized by shifting bits 
with respect to other values. Thus, computation in regard to Ty means that the value of v-th to (d+v-1)-th bit 
55 of Ry as counted from LSB is used as Ty. 

It is to be noted that Ly is the value of R y up to (v-1)th bit as counted from LSB. Thus, division by Y, i.e., 
computation of 1/Y, for obtaining Ty is realized by a bit-shift toward lower order place for every Ry. Therefore, 
is used in computing Ry, with computation of X/Y 2 for obtaining figure matching. 

30 
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Fig. 24 shows a circuit which computes Ry, Ly, T g and Cy in the algorithm M. Fig. 25 shows a systolic 
array composed of a cascade connection of PEs (Processing Elements) each forming the circuit shown in Rg. 
24. In the algorithm, j represents clocks, while i represents the positions of PEs as viewed in Fig. 25. The PE 
on the right-hand end is expressed as i = 0 (#1), while the left-hand end PE is represented by I = k (#k + 1). 

5 Referring to Rg. 25, the (#t+1)-th PE contains a value Arfi = 0 k) stored in an Internal register of the 

PE. Connections are made between the successive PEs through B ta and B^* and and T^, L&, and 

L^m, M b and and N to and No* The Bm and N b of the #1 PE successively: receive Bj and Nj (j = 0,„. f n>1), 
respectively, starting from lower order figures of Bj and while 0 is set in the inputs of T^ U, and M h . 
The construction and operation of the circuit shown in Fig. 24 will be described on an assumption of v = 

10 1, for the purpose of simplification of explanation. Referring to Fig. 24, a mark x indicates a multiplier which 
is realized by d pieces of AND. R, to R 3 are 1-bit registers which hold the values of A*, N,., and N 0 \ When N 
is an odd number, the multiplier for computing M k t and the register R 3 for holding No' can be dispensed with 
since in this case N 0 ' is 1. The register R2 holds the LSB of T± 1(0 . Registers R4 and R5 are d-bit registers which 
are adapted to delay the inputs from B ta and N h by one clock and to deliver the delayed inputs to the next PEs. 

15 Input to and output from the adder, indicated by +, are as follows. The output M, . r Nj from the lower-place 
multiplier has d bits and so does the output AfBj of the multiplier of the upper place. In order to double the 
value of the output A|-Bj this output is input with a one-bit shift to upper place with respect to the output M t . 
vNj. The input T t . ,j from the preceding PE is inputted after the bits of R^ ltj starting from the second bit as 
counted from LSB are shifted by one bit toward lower place, so as to obtain figure matching with M, . v Nj. L< . 

20 2j + y2?~ 2 means that the one-bit output L1-2J+1 fromPE which ahead of the PE which is immediately ahead of 
the instant PE is inputted to the (d-1)-th bit of Mj . V N, as counted from LSB. In this case, if Cy. 1 as carry bits 
meets the condition of Cy . , < 2** 2 , the output from the adder is of (d + 3) bits, so that Cy. n has two bits. There- 
fore, the register R6 which receives the output from the adder has (d+3) bits. 

It is thus understood that the formula (38) can be executed by a single PE shown in Fig. 24. 

25 By connecting (k+1) PEs in the form of a pipeline as shown in Fig. 25 and operating such connection of 

PEs in synchronization with docks, it is possible to execute a high-speed computation of Montgomery modular 
multiplication. 

It is to be noted, however, the following computation in accordance with an algorithm N be conducted after 
completion of the Algorithm 2, in view of the fact that the final output from the array shown in Fig. 25 is divided 
30 into the output TyL^ from the (k+1)-th PE and the output L*. ^ of the k-th PE. 



Algorithm N: 



35 


For j 




0 to m-1 




*Jc+l, j 




T k,j + Ck+l,j-l + I*-!, j +1 -X/Y2 






+ 


dw d (R k+lrj ) 


40 


c *+l, j 








uPd(Rk+l,j) 




T *+2,j 




Tk+l, j + Ck+2,3-! + L^j+i-X/lT 


45 


C *+2,j 




uPd <T k +2,j) 




Next 







In this algorithm, T k+2 j is the bit series TjO = 0 ,m-1) formed by dividing T R . The computation of R^j 

so the same as the computation which is conducted in the algorithm 2 under the conditions of A) = M } . t =0, 
so that it can be executed by the PE shown in Fig. 24. The computation of 

T*k+2j *s substantially the same as 

th computation of R^j. The division by 2, i.e., computation of 1/2, is not conducted in regard to T k+1J and 
Ck+1 j. Therefore, also is added to U- i>i with upward one-bit shift to with respect to the latter. Theref re, 
in this Embodiment a one-bit half adder (HA) and a register R 7 are provided in the adder of the PE shown in 
55 Fig. 24 and below the LSB of the register Rg. The half adder HA receives LSB of the output Rj. u of the preceding 
PE and C|c +2 j- 1 (Qc +2 j- 1 >s at the greatest one bit in this state) and delivers the result of the addition to the newly 
prepared register using the carry bit as th carry to the adder. With this arrangement, Ly^ is automatically 
shifted by one bit to higher order place when it is added. Thus, the PE for computing T^j has a construction 
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-1 ismet.tfchoweve,c.ea^^ 

also for the values of v which meet the condition of v s d. UP ' ,Catl0n 0311 be executed b * a s ' milar technique 
' [Embodiment 3 of Mongomery Modular multiplication/Modular Exponentiation Circuiq 

--SMiS^ * — te of the formu.3 (37, h 

Algorithm O: 

- For i = 0 to k 

Mi-i = dw v (dw v (Ti_ lfl ) . No «) 
For j = 0 to m 

R i,j - Ti- a , j+1 + Ci,^ + Ai-Bj-i + Nj._i.Nj 
T i,j - dw v (R if j) 

Ci, j - up v (R ifj ) ( 

Next 

Next 

to bit offeet In th. doarftta. M, Is not prate*) i„ the al g ««h m O ' »«■'«« 1" 

*nS^^ 
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and the computation can be conducted by an array composed f (m + 1) PEs as shown in Fig. 28. 

Although the description taken in conjunction with Fig. 27 ts based on an assumption of v = d, it wQI be 
dear that Montgomery modular multiplication can be executed by a similar method for any value of v which 
rneets the condition of V < d. 

5 ' 

[Embodiment 4 of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

In the execution of the algorithm O shown in Embodiment 3, it is riot necessary that Ai is beforehand set 
in PE. Namely, the arrangement may be such that Ai (i = 0 k-1) are successively inputted starting from the 

10 lower order place in synchronization with Nj through A^ as shown in Fig. 28 and that A^ and A^ are connected 
as shown in Fig. 30 so as to deliver the inputted A*. In such an arrangement A|(i = o,... t k-1) are inputted one- 
clock advance of Bj Q = 0 f ...,nv1). It is therefore possible to conduct the computation Ao*Bj{j = 0 t ...,nrv1) for all 
values of j, provided that Aq is held in the register R1 simultaneously with the input of Aq in the #1 PE. On the 
other hand, Bj are input to the next PE with 2-clock delay, whereas A is delayed only by one clock. Therefore, 

15 if A| is inputted and held in advance of Bfi = 0 f ...,m-1) in #(i - 1)-th PE, A*i is inputted and held in advance of 
Bj(j = 0,...,m-1) in #1 PE. It is therefore possible to execute the computation of A^-Bj (j = 0,...,m-1) in #1 PE. 
Thus, the algorithm 4 can be realized by the PE and systolic array of Figs. 29 and 30, without requiring change 
in the circuit scale and processing speed. 

20 [Embodiment 5 of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

Montgomery modular exponentiation can be executed by repeating computation of the formula (37). The 
computation of the formula (37) can be performed by PE shown in Figs. 27 and 29. By combining the PE with 
a memory as shown in Fig. 31, Montgomery modular exponentiation can be executed by using a single PA a 

25 plurality of times which is expressed by (3-t/2 + 2)q t where q is the number of PEs necessary for forming a 
Montgomery modular multiplication array. The number q is therefore (K+3) in the arrangement of Fig. 26 and 
(k-2) in the arrangement shown in Figs. 27 and 29. Therefore, when p pieces of PE are employed, the number 
of repetition is given by (3*t/2 + 2)*q/p. Since the processing speed is in inverse proportion to the number o 
repetition, the method of Embodiment 5 can provide a processing speed proportional to the number of PEs 

30 employed. Furthermore, the efficiency of modular exponentiation circuit is never changed when the process- 
ing speed is increased or when the scale of the circuit is decreased by varying the number of the PEs. 

It is therefore possible to construct an apparatus as shown in Fig. 32. In Fig. 32, SYMC (Systolic Modular 
Exponentiation Chip) is a chip containing p pieces of cascade-connected PEs. The number p of the PEs can 
be freely selected within the range of 1 £ p ^ (3 t/2 + 2)-q. It is therefore possible to construct a chip of a 

35 desired circuit scale. SYMC has a regularity in its circuit construction so that it can easily be formed into an 
apparatus or in the form of a chip. The processing speed can be increased in proportion to the number of 
SYMCs, by cascade-connecting a desired number of SYMCs as shown in Fig. 32. The change in the numb r 
of SYMCs essentially requires a change in the number of the processing circuits for SYMC. This, however, 
can easily be achieved by constructing a control circuit with an externally programmable ROM or the like de- 

40 vice. 

[Other Embodiments of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

In the algorithm of each of Embodiments described hereinbefore, the processing performed by each PE 
45 is a simple integer computation. Therefore, it is not essential that PEs are constructed in the form of a chip. 
Namely, Montgomery modular exponentiation apparatus can be realized without difficulty by an ordinary DSP 
or a CPU. 

The embodiments described hereinabove have regular circuit constructions and controls and delays are 
only local. These embodiments, therefore, are optimum for production in the form of VLSIs. 
so . The cascade connection of the PEs in the described Embodiments is not exclusive. Namely, PEs may be 
used as independent processing elements and may be controlled by a well-known microprogramming method 
so as to realize modular multiplication and modular exponentiation. 

In the Embodiments sh wn in Figs. 24, 26 and Figs. 27, 29, PE ex cutes all the processings for the com- 
putation of the formula (38). It is to be understood that the present invention does not exclude a case in which 
55 different portions of the formula (38) are computed by different processing I ments so as to be finally exe- 
cuted by these processing elements. 

When the present invention is carried out by using a systolic array, it is possibl to input control signals 
together with the data. Therefore, PE may be constructed in such a manner as to include a register for trans- 
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mrtting control signals. 



The modular exponent.at.on/modular multiplication circuit of the present invention can be bui! u n ^ 
form of. for example, a VLSI when a specifically high speed of oryptosystem Z nZ2?Th. 
nentiationymodu.armu.tiplication circuit of the present indention hX^2S!!^S^S^ 
In addition, the control of PEs and delay time in each PE am oni« i^i tkTJ?V . y mp,e PEs " 

multipiication circuft, therefore, is most ^ble fo^ 
10 ent invention provides a high-speed oryptosystem. ^ ThUS> the pres " 

When preference is given to reduction in size than to increase in the processing soeed such a rf„ ma ™i 
* T * ~" str c uctin 9 the "«"-■ exponentiation/modular multiplication S m S5£?£?Z2 

A demand also may arise to Increase. afterbuBding up of a oryptosystem the numbernfh^ - ^ 
also can be met without diff icuMy when the same circuit as that already Mh!t^SS^^!£T^ 

resi^^^^ — - on fe conducted by using a 

operations In arrays 1 and sTh^?^™ ^ ? ^ modular multiplication purely by multiplying 

the ^ »^.^?SX2Sr , ^K tM required for one dock is much shorter *»! 

shoJn fn Ffa 26 FurT S fewer ^ pes of PEs tha " th. array 0. which facilitates standardization of the PEs as 
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in the numbers of bits of t he integers. Therefore, the arrays 1 and 2 do not require any modification or rebuilding 
of the circuit such as SYMC for computations empl ying different number of bits. 

As will be understood from the foregoing description, the cryptic apparatus of the invention employing the 
modular multiplication/modular exponentiation circuit of any of the described Embodiments. 

[Embodiment 6 of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

In Embodiment 6, the value T. ( of TR in the i-th computing cycle is determined as follows, unlike the fore- 
going Embodiment in which T. , is determined by the formula (38). 

T. i = (T. ,. ,/Y + AtBr) + Mi-N (39) 

wherein . 

M, = ((T. , . ,fY + AtBr) mod Y)-N 0 ' mod Y, 
T-_, = 0 and No' = N' mod Y 
In order to realize this computation function by parallel processing performed by a plurality of PEs, Br and 
N are decomposed into Bj and Nj as follows. 

Algorithm P: 

For i ■ 0 to k-1 



For j 



= o 


to m - 1 




= Ti-^j/Y + Ai-Bj + C i# j-i 


Mi 


« dw v (dw v (S i#0 > -No 1 ) 




= S i# j + Mi-Nj + Li-i r j+i-X 




= sw v (Ri,j) 




= dw d + v (R±,j) - Li,j 




= upa+v (Ri, j) 



Next 

Next 

wherein dw^ (Z) « Z mod 2 d 

up d (Z) « (Z - dwd(Z))/2 d 

initial values of ?± f j, Ci f j and L are all zero 

In the algorithm P, Cg. -i is used as a carry in the computation of S y . At the same time, computations in- 
volving X and Y as constants, e.g., U- u + rX and T, . A JY can be realized by shifting bits with respect to other 
values. Thus, the computation concerning T y means that the value of the v-th to (d+v-1 )-th bit of Rg as counted 
from LSB is used as Tg. It is to be understood, however, than Lg is the value of bits from LSB to (V-1)-th bit of 
Rg. Thus, computation of 1/Y for determining Tg is realized by a bit shift towards the lower order place for every 
Rg. U 1 therefore, is used in the computation of Rg and is computed by being multiplied by X for the purpose 
of figure matching. 

Assuming that i and j are number of the processing cycles and docks, respectively, only S u and Rg need 
to be computed for each value of j in the algorithm P, since Lg, Tg and C u can be realized only by bit shifts. 
Sij and Rg can be realized by the same computation shown below: 
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f = d/y + ab + cx (40) 
wherein y is 2* or 1 and x is 2* or 1. 

The values x and y represents whether there is a bit shift or not The formula (40), therefore, can be com- 
puted by the PE shown in Fig. 14. 

The construction and operation of the circuit shown in Fig. 33 will be described. In the following description, 
v is assumed to be v = 1 for the purpose of simplification of explanation. In Fig. 33, x represents a multiplier 
which can be realized by d pieces of AMD. R1 is a one-bit register which holds a. S1 and S2 are selectors which 
select between two modes: namely, a mode in which d or c are bit-shifted and a mode in which they are not 
bit-shifted , in accordance with the values y or x. An adder represented by + performs addition of the output 
a-b from the multiplier and the outputs d/y and cx from the selectors, thereby determining the value off. R2 
is a register which holds the output from the adder. It is thus understood that the formula (40) can be comouted 
by the PE of Fig. 33. H 

Therefore, all the computations in the algorithm P can be determined by combining, as shown in Fig. 34, 
a pair of PEs of the type shown in Fig. 33. It is to be noted that and N to in Fig. 34 successively receive b! 
and Nj 0 « 0,...,m-1), respectively, starting from the lower order places. In the llustrateed case, the left PE 
computes S y , while the right PE computes Ry. Since v is 1 (v » 1), N 0 ' is also 1 provided that N is an odd number, 
so that the LSB of S w forms M, and is held in the register R1 of the right PE. R g is shown by being decomposed 
into C y , T y and Ly. It is therefore clear that the algorithm P can be executed by employing k pieces of the circuit 
of Fig. 34 or 2 k pieces of PE of Fig, 33. It is thus understood that the computation of the formula (39) be ef- 
ficiently done by parallel processing employing PEs of Fig. 33. 

Although the foregoing description in regard to Figs 33 and 34 is based on an assumption of v = 1, it will 
be clear that Montgomery modular multiplication can be executed by a method similar to that described here- 
inbefore, for any value of v which satisfies the condition of v < d. 

[Embodiment 7 of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

The following systolic array can be constructed in order to execute the computation of the formula (14) in 
accordance with the following algorithm. 

Algorithm Q: 

For i = 0 to k 
For j = 0 to m 





= Ti_! 


, j+l + Upv (Si,-}-!) ..+ 


Ai-Bj 


Mi 


= dw v 


(dw v (Si f0 ) -No') 




R iO 


= dw v 


(Si,j) + up v (Ri,j_!) 


+ Mi-Nj 




- dw v 


(Ri,j> 






«= up v 


<Ri,j> 





Next 

Next 

The algorithm Q is different from the algorithm P in that T,_ , of the formula (39) is realized by a clock de- 
viation rather than by a bit deviation. The value L y , which is generated due to bit deviation in the algorithm P 
is not generated in the algorithm Q. 

Figs. 35 and 36 show a PE and a systolic array which execute computation of Ry, T y and Cy in the algorithm 
O. Symbols j and i correspond to clocks and positions of PEs also in the algorithm Q. In the algorithm Q also, 
j and i respectively correspond to the clocks and the number of processing cycles. In Fig. 36, B in and re^ 
spectively receive Bj and Nj (j = 0,.„.,m-1 ) starting from the lower order place. . 

The following description is based upon an assumption of v = d, for the purpose of simplification of ex- 
planation. Ref rring to Fig. 35, a symbol x indicates a multiplier for multiplying numbers each having d bits, 
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while + indicates an adder. Symbol R1 represents a register for holding Ai or ML A register R2 is a register which 
holds the output of the adder. Values in the v-th and higher bits of this register are fed back as a carry to the 
adder after a delay by one clock. It is therefore understood that the left and right PEs in Fig. 36 compute SiJ 
and Rij, respectively. In the meantime, Mi is multiplied by N 0 ' by an externa! multiplier and the product is de- 
5 livered to the right PE. It is thus understood that the PE of Fig, 26 ts an efficient basis processing element for 
realizing the algorithm Q by a parallel processing. 

Although the description taken in conjunction with Figs. 35 and 36 are based on an assumption of v = d, 
it will be clear that Montgomery modular multiplication can be executed by a similar method for any value of 
v which meets the condition of V < d. 

10 

[Embodiment 8 of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

Montgomery modular exponentiation can be executed by repeating computation of the formula (39). The 
computation of the formula (39) can be performed by PE shown in Figs. 33 and 35. By combining the PE with 

is a memory as shown in Fig. 37, Montgomery modular exponentiation can be executed by using a single PA a 
plurality of times which is expressed by (3t/2 + 2)q, where q is the number of PEs necessary for forming a 
Montgomery modular multiplication array. The number q is therefore 2 k in the arrangement of Fig. 33 and 
2-(k+1) in the arrangement shown in Fig. 35. Therefore, when p pieces of PE are employed, the number of 
repetition is given by (3 t/2 + 2) q/p. Since the processing speed is in inverse proportion to the number of rep- 

20 etition, the method of Embodiment 5 can provide a processing speed proportional to the number of PEs em- 
ployed. Furthermore, the efficiency of modular exponentiation circuit is never changed when the processing 
speed is increased or when the scale of the circuit is decreased by varying the number of the PEs. 

It is therefore possible to construct an apparatus as shown in Fig. 38. In Fig. 32, MEC (Modular Exponen- 
tiation Chip) is a chip containing p pieces of PEs. The number p of the PEs can be freely selected within the 

25 range of 1 mp^ (3 t/2 + 2)-q. It is therefore possible to construct a chip of a desired circuit scale. MEC has a 
regularity in its circuit construction so that it can easily be formed into an apparatus or in the form of a chip. 
The processing speed can be increased in proportion to the number of MECs, by using a desired number of 
SYMCs as shown in Fig. 26. The change in the number of MECs essentially requires a change in the number 
of the processing circuits for MECs. This, however, can easily be achieved by constructing a control circuit 

30 with an externally programmable ROM or the like device. 

[Other Embodiments of Montgomery Modular Multiplication/Modular Exponentiation Circuit] 

In the algorithm of each of Embodiments described hereinbefore, the processing performed by each PE 
35 is a simple integer computation. Therefore, it is not essential that PEs are constructed in the form of a chip. 
Namely, Montgomery modular exponentiation apparatus can be realized without difficulty by an ordinary DSP 
oraCPU. 

The Embodiments described hereinabove have regular circuit constructions and controls and delays are 
only local. These Embodiments, therefore, are optimum for production in the form of VLSIs. 

40 It is also possible to use the combination of PEs of Figs. 33 and 35 as a single PE. 

The modular exponentiation/modular multiplication circuit of the present invention can be built up in the 
form of, for example, a VLSI when a specifically high speed of cryptosystem is required. The modular expo- 
nentiation/modular multiplication circuit of the present invention has a regular structure realized by simple PEs. 
In addition, the control of PEs and delay time in each PE are only local. The modular exponentiation/modular 

45 multiplication circuit therefore, is most suitable for construction and use in the form of a VLSI. Thus, the pres- 
ent invention provides a high-speed cryptosystem. 

When preference is given to reduction in size than to increase in the processing speed, such a demand 
is met by constructing the modular exponentiation/modular multiplication circuit of th invention by a fewer num- 
ber of PEs. Such an arrangement also is easy to form a circuit without impairing the features provided by the 

so use of PEs, i.e., regularity of the structure and locality of control and delay time. Furthermore, since the com- 
putation performed in each PE is a simple integer computation, a simple cryptosystem can be realized by soft- 
ware-type approach empl ying a CPU or a DSP which performs the computing process in accordance with 
the present invention. 

A demand may arise for increase in the processing speed of a built-up cryptosystem of the invention em- 
55 ploying a small-scale circuit such as MEC composed of several pieces of PE. Such a demand can easily be 
met by using a plurality of such small-scale circuits. Thus, according to the invention, it is possible to obtain a 
cryptosystem in which a desired increase in the processing speed can be attained simply by adding a module 
circuit, without requiring modification or alteration of the construction of such cryptic apparatus. 
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A demand also may arise to increase, after building up of a cryptosystem, the number of bits of the integers 
employed in the computation for the purpose of enhancing the effectiveness of the cryptosystem. This d mand 
also can be met without difficulty when the same circuit as that already used in the system or a similar circuit 
containing an increased number of PEs is added to the existing cryptosystem. This owes to the fact that the 

5 modular exponentiation/modular multiplication circuit of the present invention enables an easy trade-off be- 
tween the circuit scale and the number of the processing cycles, i.e., to the fact that any change in the bit 
numbers of integers employed in the computation can be dealt with by a change in the number pf the proc- 
essing cycles. It is therefore not necessary to rebuild the cryptic apparatus when there is a demand for increase 
in the cryptographic strength of the system. The same applies to the case where the numbers of bits of the 

10 integers are to be reduced. Namely, reduction in the bit numbers of the integers can easily be achieved without 
requiring rebuilding of the cryptic apparatus. 

The described advantages brought about by the present invention cannot be achieved by known modular 
exponentiation/modular multiplication cryptographic scheme which does not employ parallel processing in an 
efficient manner. It is to be understood that a flexible and expandable cryptosystem can be realized by the 

is circuit and method carrying out the modular exponentiation/modular multiplication scheme in accordance with 
the present invention. 

Although the present invention has been described in its preferred form with a certain degree of particu- 
larity, many apparently widely different embodiments of the invention can be made without departing from the 
spirit and scope thereon. It is to be understood that the invention is not limited to the specific embodiments 
20 thereof except as defined in the appended claims. 

As will be appreciated by those skilled in the art, there are various preferable conditions for the numbers 
R and N. Further details may be found in the above-mentioned articles by Montgomery and by Even, which 
are incorporated herein by reference. Preferably R and N are not divisible by each other. Preferably R and N 
have no common factors. 

25 Preferably, N is odd and R equals 2 n where n is the number of bits in the binary representation of N. 



Claims 

30 1 . A cryptic communication method using a communication apparatus which performs encryption or decryp- 
tion of a communication data by executing a modular multiplication A*B mod N of integers A and B by using 
N as the modulus, said communication apparatus having at least one computing unit which computes and 
outputs Z = U-V R- 1 mod N by using an integer R which is prime to N, said method comprising the steps 
of. 

35 inputting to one of said computing units A and a constant R* which is expressed by Rr = R 2 mod 

N f thereby causing the computing unit to output Ar = A-Rr-R- 1 mod N; 

inputting to one of said computing units B and said constant Rr thereby causing the computing 

unit to output B R = B-RrR~ 1 mod N; inputting to said computing unit the Ar and B R thereby causing 

said computing unit to output T R = Ar-B r -R- 1 mod N; 
40 and 

inputting to said computing unit the T R and a constant 1 thereby causing said computing unit to 
output, as the Q, T R -1*R- 1 mod N, whereby said modular multiplication Q = A-B mod N is executed. 

2. A cryptic communication method using a communication apparatus which performs encryption or decryp- 
45 tion of a communication data by using a modular exponentiation C = M° mod N concerning integers M 

and e using N as the modulus, said communication apparatus having at least one communication unit 
which computes and outputs Z = U-V-R- 1 mod N by using, with respect to input data U and V, an integer 
R which is primer to N, said method comprising the steps of: 

inputting to one of said computing units M and a constant Rr which is expressed by Rr = R 2 mod 
so N. thereby causing said computing unit to output M R = M-Rr.R- 1 mod N; 

representing the binary expression of e by e = (e t , e*~ 1 , e 1 ], determining the values of e' starting 

from the lowest order bit; 

representing the initial value of C R by Rr R~ 1 mod N, inputting C R and M R to one of said computing 
units when e' is determined to be equal to e 1 = 1 , thereby causing said computing unit to output CR-M R -R~ 1 
55 mod N as a new C R ; 

determining whether i of said e' is greater than 1 or not; 

inputting, when i is greater than 1 , C R as two input data to one of said computing unit, thereby caus- 
ing said computing unit to output, as new value of C R , Cr-Cr-R- 1 mod N; and 
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after completion of processing on all e* f inputting the C R and 1 as a constant to one of said com- 
puting units, thereby causing said computing unit to output as the aimed C, C = Cr-1 R- 1 mod N, whereby 
said modular exponentiation C = M* mod N is executed. 

A cryptic communication method using a communication apparatus which performs encryption or decryp- 
tion of a communication data by using a modular exponentiation C = M® mod N concerning integers M 
and e using N as the modulus, said communication apparatus having at least one communication unit 
which computes and outputs Z = U-V-R- 1 mod N by using, with respect to input data U and V, an integer 
R which is primer to N, said method comprising the steps of: 

inputting to one of said computing units M and a constant Rr which is expressed by Rr = R 2 mod 
IM, thereby causing said computing unit to output M R = MRR.R- 1 mod N; 

representing the binary expression of e by e = [e*, e*-\ ..... e 1 ), determining the values of e 1 starting 
from the highest order bit; 

representing the initial value of Cr by Rr-R~ 1 mod N t inputting C R and M R to one of said computing 
units when e 1 is determined to be equal to e 1 = 1, thereby causing said computing unit to output Cr-Mr-R- 1 
mod N as a new Cr; 

determining whether i of said e> is greater than 1 or not; 

inputting, when i is smaller than 1 , Mr as two input data to one of said computing unit, thereby caus- 
ing said computing unit to output, as new value of Mr, M r *M r *R- 1 mod N; and 

after completion of processing on all e\ inputting the C R and 1 as a constant to one of said com- 
puting units, thereby causing said computing unit to output, as the aimed C, C = Cr-1 -R- 1 mod N, whereby 
said modular exponentiation C = M* mod N is executed. 

A cryptic communication method according to Claim 2 or 3, further comprising the step of inputting to 
one of said computing units a constant Rr and a constant 1, thereby obtaining determining R R *1 -R- 1 mod 
N as the initial value of Cr. 

A cryptic communication method according to one of the preceding claims, wherein said constant R and 
said input data U and V meet the conditions of: R = 2 n + r , U<2 n + U and V < 2" * u for values u 
and r which meet either the conditions of u = 1 and r> 1 or the conditions of u > 1 and r = u + 1, where 
n meets the condition of N < 2 n . 

A cryptic communication method which employs encryption or decryption of a communication data by 
employing a modular multiplication Q = A*B mod N for input integers A and B using IM as the modulus, 
said method comprising the steps of: 

computing A-R mod N using the input A and an integer R which is prime to N, thus determining Ar 
as the computation result; 

computing B-R mod N using the input B and said R, thus determining B R as the computation result; 

computing Ar-Br-R- 1 mod N on the basis of said computing results Ar and B R and said R, thus de- 
termining T R as the computation result; and 

computing T R *R- 1 mod N on the basis of said T R and said R, thus determining said Q as the com- 
putation result; 

wherein the computation for determining said T R is executing by successively computing: 

T, = (T,. , + AtBr-Y + M t . r N)/Y 
M|. 1 = (T,. , mod Y)( - ISM mod Y) mod Y 
wherein Y equals to 2 V and Ai are sections of Ar obtained by dividing Ar for every v bits, where v 
is an optional integer. 

A cryptic communication method according to Claim 6, wherein each of the successive computations for 
determining T R is executed by a single processing element, and the whole of the successive computations 
are performed by a pipe-line processing. 

A cryptic communication method according to Claim 6, wherein multiplication r division by Y in the suc- 
cessive computations for determining T R is performed by addition with a bit shift 

A cryptic communication method according to Claim 6, wherein the successive computations for deter- 
mining T R is conducted by computing ArB^ and Mn-Nj, where Bj and Nj are sections of the B R and N 
obtained by dividing the Br and N at very d bits, and adding the computation result to the result i of 
computation of the preceding computing cycie. 
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10. A cryptic communication method which employs encryption or decryption of a communication data by 
employing a modular multiplication Q = A B mod N for input integers A and B using N as the modulus, 
said method comprising the steps of: 

computing A R mod N using the input A and an integer R which is prime to N, thus determining Ar 
as the computation result; 

computing B-R mod N using the input B and said R, thus determining B R as the computation result; 

computing Ar-Br-R- 1 mod N on the basis of said computing results Ar and B R and said R, thus de- 
termining T R as the computation result; and 

computing T R *R- 1 mod N on the basis of said T R and said R, thus determining said Q as the com- 
putation result; 

wherein the computation for determining said T R is executing by successively computing: 

T, = (T u1 /y+ ArB R .) + M r N 
Mi . n = ((T, . ! /Y + A l B R )Mod Y)-( - N- 1 mod Y) mod Y 
wherein Y equals to 2? and Af are sections of Ar obtained by dividing Ar for every v bits, where v 
is an optional integer. 

11. A communication apparatus which performs encryption or decryption of a communication data by exe- 
cuting a modular multiplication A B mod N of integers A and B by using N as the modulus, said commu- 
nication apparatus comprising: 

first computing means for computing Ar = A-Rr.R- 1 mod N, upon receipt of A and a constant Rr 
which is expressed by Rr = R 2 mod N, where R is an integer prime to N; 

second computing means for computing Br = B-Rr-R- 1 mod N upon receipt of said constant Rr 

and B; 

third computing means for computing T R = ArBr-R- 1 mod N upon receipt of Ar and B R output from 
said first and second computing means; and 

fourth computing means for computing T R -1 R- 1 mod N and outputting the computation result as 
said Q, upon receipt of T R output from said third computing means and a constant 1. 

12. A communication apparatus which performs encryption or decryption of a communication data by using 
a modular exponentiation C = M e mod N concerning integers M and e using N as the modulus, said com- 
munication apparatus comprising: 

first computing means for computing M R = M-Rr.R- 1 mod N upon receipt of M and a constant Rr 
which is expressed by Rr = R 2 mod N, where R is an integer prime to N; 

first determining means for determining the values of e ] starting from the highest order bit, wherein 
the binary expression of e is expressed by by e = [e*. e*- 1 , ..... e 1 J; 

storage means for updating and storing the value of Cr by using Cr = Rr.R- 1 mod N as the initial 

value; 

second computing means which receives the Cr stored in said storage means and M R computed 
by said first computing means when e' is determined to be equal to e 1 = 1 , thereby causing said computing 
unit to output Cr-Mr-R- 1 mod NLas a new C R ; 

second determining means for determining whether i of e' is greater than 1 ; 

third computing means for receiving Cr when i is determined by said second determining means 
to be greater than 1, and outputting.as new value of C R , CrC r .R- 1 mod N; 

and 

fourth computing means which computes, upon receipt of Cr stored in said storage means and 1 
as a constant, C = C R *1 R- 1 mod N after completion of computations performed by said second and third 
computing means on all the values of e', thereby outputting the computation result as said C. 

13. A communication apparatus which performs encryption or decryption of a communication content by us- 
ing a modular exponentiation C » M* mod N concerning integers M and e using N as the modulus, said 
communication apparatus comprising: 

first computing means for computing M R = M-Rr.R- 1 mod N upon receipt of M and a constant RR 
which is xpressed by Rr = R 2 mod N; 

first determining means for determining the values of e' starting from the lowest order bit, wh rein 
the binary expression of e is expressed by by = [e l , e*- 1 t e 1 ]; 

first storage means for updating and storing the valu of Cr by using C R « Rr.R*" 1 mod N as the 
initial value; 

second storage means for updating and storing the value of M R using the output of said first com- . 
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puting means as the initial value;. 

second computing means which receives the C R stored in said first storage means and M R com- 
puted by said first computing means when e 1 is determined to be equal to e 1 = 1, thereby causing sad 
computing unit to output Cr-M r -R- 1 mod N as a new Cr; 
5 second determining means for determining whether i of e 1 is smaller than t; 

third computing means for receiving M R stored in said second storage means when i is determined 
by said second determining means to be smaller than t, and outputting, as new value of M R , M R M R .R- 1 
mod N; and 

fourth computing means which computes, upon receipt of Cr stored in said first storage means and 
10 1 as a constant, C = Cr-1-R-* mod N after completion of computations performed by said second and 

third computing means on all the values of e 1 , thereby outputting the computation result as said C. 

14. A computing apparatus in which modular multiplication is performed by a plurality of processing elements 
of the same type so as to facilitate Integration of the computing circuit 

15 

15. A modular multiplication or exponentiation method or apparatus in which a plurality of modular multipli- 
cation operations or bit-shift divisions are carried out, to a modulus N, with an input value and a prede- 
termined value. 

16. A method or apparatus according to claim 15 in which the predetermined value is a power of R or the bit- 
20 shift division comprises division by R, where R and N have no common factor. 

17. A circuit arranged to perform high speed, modular exponentiation and high speed modular multiplication 
in accordance with the Montgomery method, so as to enable the scale of the circuit to be reduced. 
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FIG. 4 
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FIG. 5 



B 


J 


I 






An-1 




An-2 




Ao 


#1 




#2 






t 


f 




t 




46 



EP0531 158 A2 



FIG. 6 
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FIG. 7 
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FIG. 8 
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FIG. 9 
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FIG. IO 
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FIG. II 
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FIG. 12 
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FIG. 14 



A k .j (j=1...k) 

I 



Bn-i 



Tn-i-* 



PE 




Sin 


Sout 


Din 


Oout 


C in 


Cout 




55 



EP0 531 158 A2 



FIG, 15 
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FIG. 16 
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FIG. 18 
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FIG. 21 
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FIG. 24 
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FIG. 26 
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FIG. 29 



Ain- 



Bin 



Tin- 
Min- 



Nirv 



R9 



R1 - 



R5 - R4 



+ 

z 



R8 



x 

T 



R2 



R3 



R7 



R6 



Aout 



Bout 



-Tout 
-Mout 



Nout 



66 



EP 0 531 158 A2 



Fl G. 30 





PE 


A. 


PE 


A 


PE 


Bj- 








B 




o — 


#1 


T_ 


#2 


T 


#m+1 


0 — 








M ^ 








H 




N 





FIG. 31 



Ai (i=0,~,m-1) 



Bi — 



Bin 


PE 




Tin 


Tout 




Min 


Mout 




Nin 







MEMORY 



67 



EP 0 531 158 A2 



FIG. 32 



CONTROL 




SYMC 




RAM 



SYMC 



FIG. 33 




fout 



EP0 531 158 A2 
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